
When it comes to sampling, your biggest concern should be selecting a sample that is representative of the population you are trying to learn about. When your sample is representative, you are getting a good read on what you are trying to study (given your study design is sound), which means your results should replicate, roughly, from study to study. And, on a statistical level, that means that you can feel pretty confident about generalizing your results to the larger population of people you are trying to study.
How do you make sure that happens?
RANDOM SAMPLING. Scientists and practitioners are limited by a variety of constraints when they sample a population: access, modes of communication, funds, participant interest, etc. In theory, a random sample should be representative of the population you are trying to study. But a random sample means that all people of the population have a equal chance of being included in the sample, which is rarely the case in reality.
Let’s say you want to study “women,” “American voters,” or “people with a college degree.” All of those populations are in the multi-millions; it’s difficult to reach that many people and, even if you did, people have varying likelihood to participate in research studies. The result is that a sample is almost always an imperfect representation of the population you are trying to study. Whether it skews younger/older, more male/female, more lower/upper socioeconomic class, etc, samples are bound to stray from their underlying populations in some ways unless you have a very small population you are trying to study with equal access to all eligible participants. So try to get as random as possible in your sample selection.
RANDOM ASSIGNMENT. Since it’s inevitable that samples won’t be entirely random, you want to ensure that you randomly assign participants to study groups (i.e., control, treatment). This means that any type of demographic or personal quality that is over- or under-sampled will be equally represented in all of your study conditions. This helps to control external influences on your data from impacting your read on your outcome variable. Sometimes completely random assignment is not possible, like when you need a range of ages represented in your sample but there are more or fewer people in your available research pool that meet that criteria. You may need to do a stratified or blocked sample to ensure your sample has enough folks of interest in it to be able to draw conclusions.
You can also employ statistical weights to make your sample look more like the group you are trying to study. Weighting is best employed when your sample is not wildly unrepresentative, because statistical correction can only go so far to compensate for sampling issues. Overweighting a small group of respondents that may or may not be representative of their peers can have meaningful consequences for data interpretation. Namely, it can decrease the chances the data is a good read on that group of people, and ultimately mean the findings have no real predictive value.
RANDOMIZATION CHECKS. Once you’ve randomized your sample, you want to conduct some randomization, or balance, checks to ensure that your sample looks like the population you want to make conclusions about. This is your chance to make sure important demographics that tend to impact perspectives and responses are all accounted for in your study, possibly including age, race, gender, socioeconomic status, education level, marital status, parental status, etc. These are your covariates, or control variables, that will help you hone in on your outcome more clearly. Check to ensure there are approximately the same number of people from each of these groups of interest in each study condition, that they share a similar range of values, similar means, and a similar error rate. If you notice any stark differences in these values between the different assigned conditions, run a regression model with that covariate predicting your study condition variable, and see if the condition your have assigned people to is already significantly tied to a control variable. If so, you likely need to move back a step and do a block randomization with that covariate, and redo randomization checks to ensure other variables did not become problematic.
Randomization checks are a small price to pay to ensure that you don’t spend months waiting for results only to find out that something is influencing your results to an extreme degree that could have been accounted for on the front end.
RECRUITMENT. There are different methods of recruitment that you can employ that may affect your sample. If you design a political turnout study where you are measuring whether or not people sampled from a voter file (a big master list of all US voters) voted in an election after receiving a series of postcards from your organization, you have a much better chance of a representative sample because people don’t need to agree to be in the study. You can send people postcards with information from the voter file, and you can look up whether they voted or not in the same voter file post-election.
But serious recruitment concerns come into play when you start needing people to reply back to you. A lot of studies employ online panel recruitment to enroll a sample. Online panels differ in quality and their ability to target a representative sample, and the way that studies are advertised to panelists differ based on the panel and the organization running the study. Things to look out for in online panels are the ability to employ quotas in recruitment to ensure a representative sample along the demographic qualities your study should account for, or a panel where participant demographics are known and study invitations can be issues to panelists who already meet selected criteria. Panels that do not allow these capabilities potentially open you up to pure convenience sampling, where the first available people to take your study take it and may skew results towards certain demographics. A classic example of convenience sampling is that a lot of academic research in the social sciences is conducted on college students enrolled in intro or lower level classes. College students are not only generally a narrow age range (~18-22), they can differ from the overall population in terms of both socioeconomic status and education level.
Ultimately, recruitment should focus on the largest unit of the population you are trying to study that you can possibly access with the constraints of your investigation to help ensure that your sampling is sound.
Employing these steps and considering these limitations and guidelines will go a long way to helping you ensure you have a sound sample that will give you a good read on the true value of your outcome among the population you are trying to study. In lay speak, your results gave you a good idea of what the people you were interested in actually do/think. That will help your data replicate and give you a strong foundation to undergird your choices and data-based recommendations.

Leave a comment