Size & Power Series II: Walk Like a Statistician

Welcome to Article II in this 5-part article series! You can find the other articles in this series by clicking the links below!

Ok! In Article I, we discussed how stats logic isn’t really as foreign as we may have first thought. Now that we’ve cleared that up, we’re about a third of the way, conceptually, to having a conversation about power and ideal sample size, I promise! But we still have a few more concepts to unpack with respect to how a statistician actually uses that logic. To do that, we’re going to have to broaden our thought experiment from the first article about stomach aches to something a bit more realistic and interesting.

Picture this: Your stomach ache investigation has sparked your curiosity. You recognize that while some stomach aches are almost certainly caused by hunger (i.e., the true % of all stomach aches caused by hunger is not 0%), but also that some stomach aches are almost certainly not caused by hunger (i.e., the true % of all stomach aches caused by hunger is not 100%). So, that realization leads you to wonder: “What % of all stomach aches are caused by hunger?”

One key difference between this question and the simpler one from our first article is that we are no longer interested in a single stomach ache but rather in all stomach aches. In Statistics Land, we’d call this group of all stomach aches the population–it’s every “potential subject” that exists that is relevant to our question, given its stated scope. That is, we can’t answer the question “What % of all stomach aches are hunger-related?” without considering all stomach aches that have ever occurred, right?

What quickly becomes obvious, with a question like this one (or with so many of the questions we ecologists might ask), is that trying to answer an important question about a population at the level of the population is utterly intractable. Imagine trying to even document all the stomach aches occurring in your town on a single day, let along all stomach aches occurring everywhere in the world! You’d never be done, you’d have to annoy an awful lot of people just to see if they might have a stomach ache, you wouldn’t get to the bottom of the causes of most of those stomach aches, you’d never be totally sure how many stomach aches you were missing, etc., etc. The answer to your question will never come!

…Could also describe the graduate student experience pretty well…

What can we do, then, to explore this question? After all, scientists do explore questions of a similar scale as this one all the time! Answer: We shrink (and/or abstract) the problem. We collect meaningful data from a small and carefully selected number of relevant subjects, and we get an “answer” to our question at the level of this group.

For example, we might track down 30 stomach ache cases and document their causes. We might find out, in doing this, that 6 of them were caused by hunger, so our answer to the question of “what percentage of stomach aches are hunger-related” for this group was 6/30 = 20%. In Statistics Land, we’d call this small group our sample and this sample-level answer to our question our sample statistic (hence the name “statistics!”).

Then, we have a pivotal next question to ask: How similar to the entire population was our sample? If we think our sample was just like the population in every important way–just smaller–we might then use our statistic to make an “informed prediction” as to what the answer to our question might be at the level of the population. We can call our population-level answer the parameter of interest, so we can rephrase all of this by saying “What is the most likely value for the parameter, given the statistic we got?

Don’t worry, it’ll get easier…I hope…

As we try to answer that question, there’s an incredibly important logical assumption we will make that much of statistics (as we will discuss them, anyway) rests upon: The more similar our sample is to the population of interest, the more alike the statistic and the parameter should be. This should make a lot of sense, actually! If we take this logic to its extreme, the merits of this assumption become brutally obvious–a sample exactly the same size as the population (i.e., the sample and the population are the same thing) would produce a statistic exactly equal to the parameter without fail! I mean, for example, the average of the entire group must be equal to the average of the entire group, right? Right.

Ok, I guess I lied–nothing, even that mean(X) = mean(X) is obvious, our lives are lies, and we can’t have nice things…

However, what happens if we move to the opposite extreme? A sample of 1 subject is much less capable of being exactly like the population (if it even can be at all)–in our stomach ache example, we’d either get a statistic of 0% (the one stomach ache we looked at was not hunger-related) or 100% (it was hunger-related), and those are the only two options. If the true parameter were really around 50% (half of all tummy aches are hunger-related), our statistic and our parameter would be really far off from each other no matter which sample of 1 we took! But even just doubling the sample to two subjects would significantly improve things–it’d be possible, at least, to get a statistic of 50% (and it’d happen in 50% of all such random samples)!

We just established another important concept there! As we increase the size of our sample, we become more and more likely to get a sample that is a (more or less) perfect microcosm of the population we’re interested in, and thus to get a statistic similar to the parameter. I hope this “rule” is just intuitive for you because, if not, the reasons behind it are a little hard to explain. The way I like to think about it is this: “Oddities” and random chance–things that can cause a sample become dissimilar to the population–tend to get increasingly overwhelmed by “predictable processes” and “normalcy” as we make a sample larger, so we necessarily will gravitate towards the “truth.”

What I just described is so fundamental and predictable a process that it’s actually a Theorem: The Central Limit Theorem (CLT). As it turns out, in many cases, even a sample of just ~30-50 subjects drawn thoughtfully from a population of millions can give you a surprisingly accurate guess at the parameter! That’s the “miracle of probability” for you!

The CLT is also the culprit behind the so called “Wisdom of the Crowd,” which says that while we may individually be bad at guessing, collectively, we will be centered around the truth.

However, the key word in the previous paragraph is “thoughtfully.” The CLT has an important caveat–it only holds true so long as our sample is representative. The word “representative” is doing some seriously heavy lifting here, but to simplify, this is a fancy way of saying “the sample isn’t systematically unlike the population in some way.” This caveat also makes intuitive sense, I hope! If, in reality, 99% of stomach aches are hunger-related, but we purposefully chose a sample that only included non-hunger-related stomach aches, even if that sample were huge, we would get an “answer” really far from the “truth!” That’s a nugget of wisdom worth remembering: The CLT holds immense power, but it has limited power to protect us from ourselves and our capacity to collect bad samples.

Here’s the important question we just begged, so to speak: How could we get an unrepresentative sample? To simplify, there are three common ways:

  1. Our sample is just too small (and the world is very quirky/random). We kind of already established this! The smaller our sample, the more “random chance” and “quirky cases” can push our sample away from the population as a whole. You can think of this as the logic of the CLT but in reverse: If a sample gets more like the population the bigger it gets, it makes sense that it may also get less like the population the smaller it gets. There’s no guarantee a small sample will be unrepresentative, but it increases the odds.
  2. Our sample is biased. Gasp! I said a bad word! Bias, intentional or not, is the antithesis of good science, and even powers as great as the CLT can’t save us from it! Here, I’ll call a biased sample one collected in any way, intentional or not, that favors inclusion of some subjects from the population over others in a way that runs parallel to our hypothesis (either for or against). That’s a bit clunky, so here’s an example: If we collect stomach-ache data from only people suffering stomach aches while inside restaurants, where hunger should be a less common cause then in most other locations, we are probably going to get a sample that suggests hunger-related stomach aches are a lot less common than they really are, right? If you want to perform bad science, intentionally collecting a biased sample favorable to your hypothesis is one of the easiest ways to do it!
  3. Our sample didn’t cut across subgroups in the population. This is sort of related to bias, but I think of it as the same problem wearing a different hat. What if we only collect a sample of stomach ache cases from here in the US–what problems might this cause? Well, for various reasons, hunger is less an issue here in the US than in other parts of the world. Americans are a distinct subgroup of all humans; by ourselves, on this issue at least, we cannot be a representative sample of all humans!

We get around these problems by: 1) Taking large(-ish) samples, 2) Being aware of biases (unconscious or not) and working to eliminate them with our study design (e.g., by “blinding” our study so we don’t know which subjects are getting which treatments), 3) Identifying key subgroups in the population beforehand and sampling across them (e.g., “blocking” in experiments), and most importantly 4) Using randomness to construct our samples so that bias and subgroups are less of a problem (or temptation!).

So, across the two articles in this series so far, we’ve seen that statisticians use age-old logic to think about which explanations are best while acknowledging we can never be certain that any explanation is definitely correct. We’ve also seen that, to find explanations to large problems, statisticians try to make those problems smaller, which generally works quite well so long as they do it carefully.

In the next article, we’ll apply all these ideas to our question about the true % of hunger-related stomach aches, and we’ll take advantage of one of my favorite tools to do it: simulation! You can get to that third article here.