Last modified 05/07/2020
o Weighting your data
•
Do I
need to oversample certain populations?
We oversample to increase the reliability and precision of estimates of certain population subgroups.
For example, the National Health and Nutrition Examination Survey (NHANES) used oversampling to sample
larger numbers of subgroups of interest, such as minorities, adolescents, and older adults [16]. This increases
reliability and precision of estimates in these population subgroups [16]. Rural populations might be another
subgroup of interest when considering oversampling to help ensure that you achieve a large enough N value for
appropriate statistical power.
For sub-populations that are at least 10% of the total population, a general sample will usually produce reliable
estimates [21]. For subpopulations between 1% and 10% of the total population, the oversampling methods
described in the Examples 5A-C
below are needed [21].
By considering oversampling clusters before the first or second stage, you have a better opportunity to net a
desired sample size.
Possible oversampling methods include increasing the number of units (e.g., census blocks) in your first
stage [22]. You could also increase the number of units (e.g. households) in the second stage [22].
An example of oversampling to increase cluster selection is a modified application of the CASPER study
design whereby the 30 × 7 design is modified to a 35 × 7 therefore increasing the N value by 16.6% [23].
The NHANES study design, as another example, draws its sample in the following stratified, four stages [24]:
• Stage 1: PSUs are first stratified according to population size, and then PSUs are selected from each
stratum. These are mostly single counties or, in a few cases, groups of contiguous counties with
probability proportional to a measure of size (PPS).
• Stage 2: The PSUs are divided up into segments (generally city blocks or their equivalent). As with each
PSU, sample segments are selected with PPS.
• Stage 3: Households within each segment are listed, and a sample is randomly drawn. In geographic
areas where the proportion of age, ethnic, or income groups selected for oversampling is high, the
probability of selection for those groups is greater than in other areas.
• Stage 4: Individuals are chosen to participate in NHANES from a list of all persons in selected
households. Individuals are drawn at random within designated age-sex-race/ethnicity screening
subdomains. On average, 1.6 persons are selected per household.
In
stage 3,
households from each segment are randomly drawn. For geographic
areas where
the proportion of
age, ethnic, or income groups selected for oversampling is high, the probability
of selection for those groups is
greater than in other areas
[16]. This can be replicated in other biomonitoring
studies that apply probability
sampling to
oversample populations
of interest.
• How do I compute weights for study participants?
Two examples of study protocols that can serve as resources for computing weights are listed below:
• NHANES provides information on the health and nutritional status of the noninstitutionalized civilian
resident population of the United States. The sample for NHANES is selected using a complex, four-
stage sample design. NHANES carries out sample weighting in three steps. The first step computes base
13