Size & Power Series V: The Power and the Glory (and the Sample Size)

Welcome to Article V in this 5-part article series! You can find the other articles in this series by clicking the links below!

We’ve made it to the fifth and final article in this series! In the last article, we met a workhorse of modern statistics, the Confidence Interval, and we saw how it mapped onto the approach statisticians use to make more objective decisions within a context of uncertainty. Importantly, we also saw the equation (such as it is) for calculating a Confidence Interval, as well as the shortcut math we can use to calculate a standard error to put inside of the CI equation.

Let’s begin this article where we left off the last one: By examining the equation we use to build a Confidence Interval. That equation was: Statistic +/- [some fudge factor]. Over the course of that article, we amended that equation slightly to: Statistic +/- (# of standard errors needed for some Confidence level * the size of our standard error).

While we saw we can also get a standard error for plugging into this equation by doing some “imagining,” we saw that we could just do a math problem to get a standard error too (assuming some assumptions were met!). That math problem was: Standard error = sqrt( p*(1-p) / n ), where p is the true proportion (or in our case, what we think it might be–usually, our sample statistic) and n is our sample size.

Bear with me here–the juice will be worth the squeeze!

So, more generally, if we substitute, we could calculate a CI by doing: Statistic +/- (# of standard errors needed for some Confidence level) * (sqrt( p*(1-p) / n )). It may not seem like it, but this is actually a really significant shift–we just introduced sample size into our equation for our Confidence Interval.

Why does that matter? We can use algebra to re-arrange this equation so that n is by itself; we can solve for sample size. It’s a little ugly to actually do this, so I’ll just tell you that the result is this: n = ((# of standard errors needed for some confidence level)/(Margin of Error)^2) * p * (1-p).

The observant members of the audience will have noticed that this has introduced a heretofore unseen term into our equation: the Margin of Error [Sidenote: “error” must be a statistician’s favorite word–it comes up a LOT!]. This is half the final width of the Confidence Interval, aka the distance between the statistic and either the upper or lower bound. So, if your Confidence Interval spans 6% points, your Margin of Error would be half that: 3%.

Even if its construction is foreign-looking to you, the logic of this equation actually makes a lot of sense. Let’s break down this equation, term by term:

  • # of standard errors needed for some confidence level–this is in the “top” of the right side, so as this number goes up, so too does sample size. This makes sense–if we need to go out “wider” (more standard errors) just to maintain the same level of confidence in the Interval we make, samples must be very different from one another, such that each new one only gets us a little closer to the truth. We’re going to need more information to keep the same level of certainty in that instance.
  • Our Confidence Level is in that term too. It’s functionally in the top of the right side–if we want to be even more Confident that our Interval will capture the truth, we need more information, and the sample size will need to go up.
  • p, our best guess for whatever the truth is, is also in the top of the right side, but so is 1-p, so this one is a little quirky. As it turns out, this whole term (p*(1-p)) gets larger the closer p gets to 0.5 and smaller the closer it gets to 0 or 1. So, for reasons I won’t explain here, the closer your event gets to a “50-50 chance,” the larger your sample size would need to be to produce the same-sized CI. A “coin flip” is going to require more info to be certain about than a “sure thing” would!
  • Lastly, we have Margin of Error. This, unlike all the other terms, is in the “bottom” of the right side. If you are comfortable with a wider Confidence Interval (which is the same thing as saying a more non-committal one), you can make do with a smaller sample.

So, how liberal or conservative we want to be factors into this equation in two places: In the Confidence Level we choose (what is the long-run average probability that the CI will contain the parameter?) and in our Margin of Error (how large or small is the “fudge factor” I’m putting around my statistic?). Meanwhile, how chaotic the world is–and thus how much samples will vary and how much info about the “truth” they will each carry, also factors into this equation in the term p*(1-p).

The upshot is that if we can decide how liberal/conservative we want to be and can estimate how chaotic we suspect the world will be, we can get a sample size that, on average, will yield a CI as big as we want and “right” as often (in the long-run sense) as we want.

This will probably be easier to think about with an example. Let’s say we want to know to know the true % of stomach aches that are hunger-related. Let’s assume we are confident that the true p is around 0.25 (25% of all stomach aches are hunger-related). At the end of our study, any Interval we build that’s wider than 7% feels too weak-sauce for us, so we’ll use half that (0.035) as our Margin of Error. However, we also want to be 90% Confident the interval we do build contains the truth, which will take 1.65 standard errors to achieve.

Let’s pause here to notice that, in this instance, we kind of want to have our cake and eat it too–we don’t want a very wide interval when we’re done (it’s precise), but we want it to have a high likelihood of containing the truth anyway (it’s also accurate). Well aren’t we greedy! After all, in the last article, we built a CI using a lower Confidence Level (80%) and a sample size of 50 that ended up being about 15 percentage points wide (twice the width of the one we’re now building). Getting an interval half that wide but even more likely to be accurate is going to require a lot more information to pull off–we can already expect the “ideal sample size” to achieve this result will be large. Let’s see if we’re right.

#Calculating N for when ME = 0.0035, p = 0.25, and Z = 1.65 (called a critical value)
> (n = ((1.65/0.035)^2) * 0.25 * 0.75)
[1] 416.7092

So, according to our equation, anyway, we’d need a sample of ~417 stomach aches to up our Confidence to 90% from 80% and to cut our Margin of Error roughly in half from 7.5% to 3.5%. This is something like 8 times the sample we needed to get our first result! Yikes.

Let’s use simulation to see if that result holds up: Let’s draw 1000 random samples of size 417 from a population with a true p = 0.25. Let’s then calculate a 90% CI for each one. After that, let’s see what proportion of these Intervals contain the true parameter and also what their average Margins of Error were.

#Simulating 1000 random samples from a population with p = 0.25, 90% Confidence, and n = 417.
rand_samples1000_v3 = do(1000)*{
  new_sample = sample(c("Hunger-related", "Not-hunger-related"), 
                      size = 417, 
                      replace = TRUE,
                      prob = c(0.25, 0.75)
  )
  prop(new_sample, success="Hunger-related")
}

#Calculate our standard error using our shortcut equation
(SE = sqrt((0.25*0.75)/417))
[1] 0.02120472

#Use mutate to create two new columns--one for our upper bounds of our CIs and one for our lower bounds. 
rand_samples1000_v3 = rand_samples1000_v3 %>% 
  mutate(CI.upper = prop_Hunger.related + (1.65*SE),
         CI.lower = prop_Hunger.related - (1.65*SE))

#Take a peek at what we've made.
head(rand_samples1000_v3)
  prop_Hunger.related  CI.upper  CI.lower accurate
1           0.2278177 0.2628055 0.1928300      Yes
2           0.2206235 0.2556113 0.1856357      Yes
3           0.2829736 0.3179614 0.2479858      Yes
4           0.2206235 0.2556113 0.1856357      Yes
5           0.2398082 0.2747959 0.2048204      Yes
6           0.2086331 0.2436209 0.1736453       No

#Make a new column, accurate, that is "Yes" if the CI contains the true value of 0.25
rand_samples1000_v3$accurate = "No" #We'll overwrite these Nos where we need to.
rand_samples1000_v3$accurate[rand_samples1000_v3$CI.upper >= 0.25 & 
                             rand_samples1000_v3$CI.lower <= 0.25] = "Yes"

#Calculate the proportion of successful intervals.
prop(~rand_samples1000_v3$accurate, success = "Yes")
prop_Yes 
   0.903 

#What was the average Margin of Error? Take means of the upper and lower bounds, subtract, then divide by 2.
(mean(rand_samples1000_v3$CI.upper) - mean(rand_samples1000_v3$CI.lower)) / 2
[1] 0.03498779

#Note that that is the same as 1.65 * our SE
1.65 * SE
[1] 0.03498779

So, what did we learn here? 90.3% of the Intervals we’d have made from these 1,000 samples of size 417 would have contained our parameter (pretty darn close to 90%!), just as we had planned it. Their Margins of Error were 0.035 almost exactly, just as we had planned it. So, with each sample, we’d be able to report a guess about the truth with just 3.5% fudge factor on either side, just as we had planned it, and we’d be right most of the times we did that, just as we had planned it.

Would some of our Intervals have failed? Yes, about 10% of themjust as we had planned it. There can’t be no risk of failure here unless we want to draw an Interval of 100% Confidence, which would get us right back to crafting something so wishy-washy we might as well have not bothered!

So, we’ve seen how we can use our Confidence Interval equation to find an “ideal sample size,” one that would seemingly, on average, help us achieve the precision and accuracy we’re comfortable with given the reality we expect to encounter. Here, we focused on proportions, but the same trick works for means, differences in proportions, differences in means, correlations, etc. Now that you understand what is happening, you can use calculators online like this one to experiment with different choices and situations. And, if you find yourself in a situation one of these calculators can’t handle, you can always try simulating your way to an answer!

In this article series, I’ve essentially built you a sword, gave it to you, taught you how to swing it, and explained why you might want to. However, it would be irresponsible of me to do that and then not ask: Should you swing this sword, just because you can?

What could possibly go wrong?

Let’s close with a list of cognitive traps one might fall into when using these “ideal sample size” calculations.

  • Assuming that using the sample size one calculates assures them a particular outcome: A sample will be close to the truth, or it won’t be. A 90% Confidence Interval will contain the truth, or it won’t. Take a good sample, and you will still come up bust about 10% of the time. We only ever operate in a random world and in the context of uncertainty. All these calculations do is give you a sense of what should work over the long run, not what will definitely work when you step up to roll your dice! If you go into using these calculations thinking that what you will get out is the sample size needed to confirm your hypothesis, you’re chasing miracles!
  • Assuming that the selection of a sample size in this way is “objective:” When someone goes searching for an “ideal” sample size, what they sometimes really are searching for is the “right” sample size. There is no such thing. First, the “right” sample is not a sample at all–it’s a census of the entire population. As soon as you decide you must rely on a sample, you’re taking a chance that what you will learn about your sample will not hold for the population! Second, when we rearrange our CI equation to solve for n, we take something that is usually used “objectively” (plugging in a p, an n, an SE, and a critical value we know for certain because we’ve already gotten or chosen them) and force it to be dependent on our subjective choices. What p do you expect? What Margin of Error do you want? What Confidence Level are you comfortable with? These are not strictly objective questions (probably), so no answer you will get back from this exercise will be either!
  • Assuming that the sample size one calculates is the sample size one has to them have: In an earlier article in this series, I said that simulations are fantastic tools. I should have said they are fantastic at least to the extent they mimic reality. They are great for exploration, but would I bet my life on their predictions? Unless I was positive they were constructed properly, probably not! Consider the sample sizes you get from these equations to be “explorations from a simulation.” Not only do you not have to use them (after all, they are not “right” in any truly defensible way!), but you also shouldn’t blindly trust them either. Which brings me to my next point…
  • Assuming that having an “ideal sample size” protects you from bad science: In an earlier article in this series, I noted that so much of the logic and tools we statisticians use are based on the assumption that our sample is representative. These tools are no different–if you take a biased sample, or one that over- or under-represents subgroups, or is confounded in some other way, your “ideal sample size” won’t save you! Sample size is important, but sample construction is even more important! A perfectly sized, crappy sample will give you a perfectly sized crappy answer every time.
  • Assuming the standard Confidence Interval rules always apply: This one is technical, so I won’t go into the details here, but here’s the gist: Remember the CLT? Remember Normal Curves? These equations assume that, if you took a boatload of samples, the CLT would kick in and produce a Normal Curve of statistics. That’s usually a good assumption. Sometimes, it’s not. The most common problem for this assumption is non-independence–your sample subjects are more/less similar to each other than they “should” be if they were truly random. Think siblings, or birds from the same nest. If you have any reason to think this “Normality Assumption” wouldn’t hold for you, beware the answers these equations could provide.

If you want to know more about the considerations surrounding sample sizes, this paper looks like a good resource!

There you have it! We’ve marched a long way over the course of these five articles—did you make it through ok? Were there parts that were particularly interesting or confusing for you? Do you want to chat about the issues I’ve presented here? If so, please leave a comment below! I hope this article series helps you feel more prepared to craft a study–and a sample–that will serve you well in your exploration of the natural world!

Advertisement