libname nhanes 'e:\EPIC Class\NHANES\data\';
*libname nhanes 'C:\Documents and Settings\KeyesK\Desktop\EPIC class';
/******************************************Lab 5**********************************************/
/****************************************Weighting*****************************************/
/*********************************************************************************************/
**First, let's create a variable that indicates whether the person has
data for BMI or not**;
data BMI; set nhanes.depressionBMI;
if BMXBMI=. then RESP=2; else RESP=1;
if BMXBMI=. then RESP0=0; else RESP0=1;
run;
**Next, let's calculate the weighted sums for our explanatory variables**;
PROC CROSSTAB data=BMI design=wr ;
nest SDMVSTRA SDMVPSU/missunit;
weight SAMPWEIGHT;
subgroup RESP RIAGENDR RIDRETH1;
levels 2 2 5;
tables RESP*(RIAGENDR RIDRETH1) RESP*RIAGENDR*RIDRETH1;
run;
**What happened? How would we fix this?******;
PROC CROSSTAB data=BMI design=wr ;
nest SDMVSTRA SDMVPSU/missunit;
weight SAMPWEIGHT;
subgroup RESP RIAGENDR RIDRETH1;
levels 2 2 5;
tables RESP*(RIAGENDR RIDRETH1) RESP*RIAGENDR*RIDRETH1;
SETENV decwidth=2 colwidth=12;
PRINT NSUM WSUM /
FILETYPE=RTF FILENAME="E:\EPIC class\Class 5\WEIGHT_OUT1.rtf" REPLACE
FONTNAME="Arial" FONTSIZE=10 STYLE=NCHS
TOPINCH=1 LEFTINCH=1 RIGHTINCH=2.5 BOTTOMINCH=2.5;
run;
/*We seek to create nonresponse adjustments that will force the reweighted
respondent totals (i.e., RESP=1) to equal the weighted sample totals (i.e., RESP = 0 or 1),
across the levels of
RACE, GENDER and the interaction of RACE with GENDER.*/
PROC WTADJUST data=BMI design=wr ADJUST=NONRESPONSE;
nest SDMVSTRA SDMVPSU/missunit;
weight SAMPWEIGHT;
CLASS RIDRETH1 RIAGENDR;
MODEL RESP0= RIDRETH1 RIAGENDR RIDRETH1*RIAGENDR;
run;
/*Page 4 of the output shows that the weighted response rate varies from 93.9% for mexican american men
to 96.3% for white men (the weighted response rate associated
with INTERCEPT is the response rate for the whole sample.
We also see the respondent and nonrespondent sample sizes - these match what we
saw in the crosstab above. It also shows no trimmed weights -
since there was no WTMIN or WTMAX
specified, there are no trimmed weights in this example.
*/
/*Page 5 shows the marginal weight adjustment, defined by the
control totals divided by the weighted respondent totals. The
weighted respondent totals are computed using the trimmed
sample weights. So the interpretation here is that for
each respondent is providing information about him/herself
and 0.0449 of a person on average.
Use this to set the LOWERBD and UPPERDB of the weight adjustment factors.
Page 5 and 6 shows that the unequal weighting effect is the same both before and after weight trimming -
we see that the original and the trimmed are equal.
This is expected, because we did not specify weight trimming.
The final weighting effect incorporates the weight trimming factors, the marginal adjustments, and the original
sample weight.
*/
/*Page 7 shows that the race by gender interaction term is
not significant in this model. In some applications, one may want to reduce
the number of explanatory variables in the model, for example, by removing
the nonsignificant covariates. Reducing the number of explanatory variables
will tend to reduce the unequal weighting effect, thereby possibly reducing
the variance of subsequent estimates generated with the final adjusted
weight. In other applications, one may want to force certain variables in the
adjustment process, primarily to reduce bias in estimates generated with the
final adjusted weight. In this example, we keep the main effects of region,
race and gender in the model, as well as the interaction of race and gender,
regardless of their statistical significance.
*/
/*In general, the next step in a weight adjustment process is to establish weight
truncation bounds, if desired. Any weight larger than a specified maximum
weight, or smaller than a specified minimum weight, will be trimmed or
padded to meet the appropriate bound.
First we will set the sample weights.This can be provided by the user.
If this is not provided by the user, SUDAAN will assume
a minimum of 0 and a maximum of 10^20.
In this example, we impose a
minimum weight threshold of 1000 and a maximum weight threshold of 80000
(arbitrary decision).
If we wanted to go through the 3*IQR process, we would start with:
proc sort data=BMI; by SDMVSTRA; run;
proc univariate data=BMI; var SAMPWEIGHT; by SDMVSTRA; run;
*/
PROC WTADJUST data=BMI design=wr ADJUST=NONRESPONSE;
nest SDMVSTRA SDMVPSU/missunit;
weight SAMPWEIGHT;
IDVAR seqn RESP0 RIDRETH1 RIAGENDR;
CLASS RIDRETH1 RIAGENDR;
MODEL RESP0= RIDRETH1 RIAGENDR RIDRETH1*RIAGENDR;
WTMIN 1000;
WTMAX 80000;
run;
/*Notice the number of weights that were
trimmedon page 5, and pages 5-6 indicates that the overall unequal weighting effect was
reduced slightly. */
/*Next, we turn out attention to setting appropriate bounds on the weight
adjustment procedure.
This is l sub k and u sub k in the weight adjustment model.
Page 5 indicates that the minimum
nonresponse adjustment observed over the entire respondent sample is 1.0384
(see line for Intercept). In general, for nonresponse applications, we want
the nonresponse adjustment to be one or greater so that every
respondent represents themselves in the final estimate, as well as some
portion of the nonrespondents. Page 5 also indicates that the
maximum nonresponse adjustment over the entire respondent sample is
1.0652. This indicates that our unadjusted nonresponse weighting
has a pretty minimal effect (probably because
the variables we chose don't predict missingness very well).
Note that in many
applications, reducing the upper bound can actually increase the effect of unequal
weighting on your estimates, so monitoring the unequal weighting effect during this process is
relatively important.*/
/*The Marginal Weight Adjustment in Page 4 provides some
guidance on feasible values for the upper and lower bounds on the
nonresponse adjustment. Upper and lower bounds can be set for the whole
sample, or for subgroups of the sample. If you are interested in establishing
one upper and one lower bound that would apply to the entire sample, then
the lower bound must be set to something smaller than the smallest number
that appears in this column (1.0384)), and the upper bound must be set to
something greater than the largest number that appears in this column
(1.0652).
Suppose we had ignored the Marginal Weight Adjustment and set the lower
bound to 1.04 and the upper bound to 1.60.*/
PROC WTADJUST data=BMI design=wr ADJUST=NONRESPONSE;
nest SDMVSTRA SDMVPSU/missunit;
weight SAMPWEIGHT;
IDVAR seqn RESP0 RIDRETH1 RIAGENDR;
CLASS RIDRETH1 RIAGENDR;
MODEL RESP0= RIDRETH1 RIAGENDR RIDRETH1*RIAGENDR;
WTMIN 1000;
WTMAX 80000;
LOWERBD 1.04;
UPPERBD 1.06;
run;
/*Look at the log -- what kind of warning do you get??*/
/*Look at page 1 of the output -- what do the *** for betas tell you?*/
/*Let's rerun with more appropriate upper and lower bounds for the nonresponse weight adjustment*/
PROC WTADJUST data=BMI design=wr ADJUST=NONRESPONSE;
nest SDMVSTRA SDMVPSU/missunit;
weight SAMPWEIGHT;
IDVAR seqn RESP0 RIDRETH1 RIAGENDR;
CLASS RIDRETH1 RIAGENDR;
MODEL RESP0= RIDRETH1 RIAGENDR RIDRETH1*RIAGENDR;
WTMIN 1000;
WTMAX 80000;
LOWERBD 1.03;
UPPERBD 1.07;
OUTPUT / PREDICTED=ALL FILENAME=outsud FILETYPE=SAS REPLACE;
run;
/*This run of WTADJUST ran successfully.
The minimum observed weight
adjustment is 1.0384, and the maximum observed weight adjustment is
1.0652. The overall unequal weighting effect changed from 1.7833, to 1.7827
after weight trimming, to 1.7739 after the nonresponse adjustment was
applied.
The last two columns in the output on the show the unequal weighting effect
associated with the original
respondent weights, the weights after weight trimming and the unequal
weighting effects after the nonresponse weight adjustment is applied.
In this example, it appears the
weight adjustment is decreasing the unequal weighting effect by a very small
amount (i.e., 1.7739-1.7833= -0.0094). Although it has decreased slightly in
this case, it is not uncommon for one to see the unequal weighting effect
increase after applying a weight adjustment. Nonetheless, in this example,
the unequal weighting effect is decreasing by a small amount, so we would
expect the precision of any estimate produced using the newly adjusted
weights to increase by a very small amount due to the effects of unequal
weighting only.
Notice in this example that an OUTPUT statement was included in the last WTADJUST procedure.
The data set OUTSUD will contain one record for every record on
the input dataset. This dataset will contain several variables including the
final weight trimming adjustment (stored in the variable TRIMFACTOR),
the final nonresponse adjustment (stored in the variable ADJFACTOR) and
the final nonresponse adjusted weight (stored in the variable WTFINAL).
/*It is always a good idea to check the weight sums
both before and after any weight adjustment is applied. Here, we check the
weight sums by running a CROSSTAB with the final adjusted weight
WTFINAL.*/
data outsud; set outsud;
if RESP0=0 then RESP0=2;
run;
***Original crosstab****;
PROC CROSSTAB data=BMI design=wr ;
nest SDMVSTRA SDMVPSU/missunit;
weight SAMPWEIGHT;
subgroup RESP RIAGENDR;
levels 2 2 ;
tables (RIAGENDR)*RESP;
SETENV decwidth=0 colwidth=10;
print wsum nsum;
run;
***New weight crosstab*******;
PROC CROSSTAB data=outsud design=wr ; *<-- changed the dataset name;
nest SDMVSTRA SDMVPSU/missunit;
weight WTFINAL; *<-- changed the weight name;
subgroup RESP0 RIAGENDR;
levels 2 2 ;
tables (RIAGENDR)*RESP0;
SETENV decwidth=0 colwidth=10;
print wsum nsum;
run;