This article provides a detailed explanation of the weighting definitions used by the application.
Weighting efficiency
With reference to data to which weighting has been applied, weighting efficiency is a measure of the impact of the weights on altering the original data, to which the weights are applied, and affecting the stability of subsequent statistical analyses. The efficiency measure is directly influenced by the size of the weights, especially as they differ from a weight of one. The efficiency values range from the ideal of 100 to 0. Weighting efficiency is 100 when all respondents are given equal weight, a weight of one, and the original data require no adjustment. This is the ideal sample situation. On the other hand, as the efficiency approaches zero, this should be taken as an indication of problems with the variables used for weighting. The distribution of responses across a variable’s categories may be very skewed or there may be too many categories, either way leading to one or more categories with a very small number of, or zero, responses. These small cells may case instability in the weight calculations. Or, statistical relationships between the variables used for weighting may be so strong as to create small cell counts when these highly correlated variables are cross-referenced. Or, simply too much may be expected of a sample that differs considerably from the target population. Weighting should be viewed as minor cosmetic adjustment to align sample data. When problems like these occurs, very large weights (e.g., values of 5 or more) and very small weights (e.g., values of .2 or less) are calculated and applied to the original data, leading to low efficiency. The smaller the efficiency is, the more the weights struggle to adjust the data to match what may be unreasonable targets. At the end of the weighting procedure, weights may be calculable but the statistical price paid is low efficiency, resulting in loss of statistical sensitivity.
While there is no generally accepted cut-off differentiating strong from weak efficiencies, an efficiency measure less than 40% is problematic, often indicative of considerable sample imbalance, when compared to the desired population targets, resulting in very large and very small weights. Weak sampling efficiency is not a reflection of the quality of data values themselves. Rather, it is a reflection of the loss in statistical precision of the data when statistical analyses are applied. For example, weak efficiency leads to an increase in standard errors and confidence intervals for statistical summaries, as well as resulting in less sensitive statistical tests. In addition to this loss of precision, weighting may mask sampling problems. As possible solutions, augmenting the sample (e.g., including more respondents whose sample characteristics most resemble those assigned greatest weights) is suggested. Perhaps more practically, the weighting process can be modified. For example, begin by comparing the actual distribution of each weighting variable to the desired target distributions. On a cell by cell basis, differences in percentages exceeding 10 points (e.g., members whose ages span 25 to 34 account for 30% the sample but are required to account for 15% of the target population) will indicate imbalances that lead to low efficiencies. Consider combining adjacent cells to smooth out some of the difference, or adjust the target population definition or alter the analysis plan to accommodate the sample composition (e.g., filter the sample by age and run analysis separately by age group). Next, reduce the number of variables used as the basis for weighting. Ideally, no more than 4 or 5 variables should be used. For each of the variables, reduce the number of categories of responses. Three or four response categories per variable may be sufficient; no more than 5 should be used. Before beginning the weighting process, examine relationship and response patterns across the variables to be used. At the simplest, cross-tabulate all pairs of variables, to get a sense of the strength of bivariate relationships. Remove any variables that are strongly related to others. At the other extreme, create all patterns of response combinations to identify any missing or infrequently encountered cells.
Effective sample size
When weighting is applied to a sample of data, the effective sample size is calculated as the original sample size multiplied by the weighting efficiency (expressed as a proportion ranging from 0 to 1). The effective sample is then be used in subsequent statistical analyses, for example as the base size (denominator) in estimating the standard errors of a statistic such as averages and percentages. The effective sample makes explicit the effect of weighting on statistical analysis applied to the weighted data. (There is also an effect of weighting on the variability in the data, but this is less obvious when only viewing results of statistical tests.) Small efficiencies reduce the sample size considerably thereby reducing the sensitivity (i.e., power) of the statistical tests.
Winsorizing weights
An additional consideration when applying weights is the size of weights themselves, calculated for each respondent or member. To control sampling efficiency (and not reduce too drastically the effective sample size), the weights are “Winsorized”, constraining all weights greater than 5 to equal 5 and constraining all weights smaller than .2 to equal .2. Note that Winsorization trades off the possibility that the weighted sample doesn't exactly replicate the "true population" with the desire for better statistical precision.
Effect of filtering on weighting
An additional consideration when applying weights is the size of weights themselves, calculated for each respondent or member. To control sampling efficiency (and not reduce too drastically the effective sample size), the weights are “Winsorized”, constraining all weights greater than 5 to equal 5 and constraining all weights smaller than .2 to equal .2. Note that Winsorization trades off the possibility that the weighted sample doesn't exactly replicate the "true population" with the desire for better statistical precision.
*The ideal weighting process would require that the customer/user have targets for each possible filter of interest to them. However, it is most unlikely that these targets would be known or available. As such, the goal is to provide some form of data protection when samples deviate from population targets. The weighting approach taken here also provides a basis for comparability across filtered subgroups.
Comments
0 comments
Article is closed for comments.