Katherine Keyes



Epidemiology and Population Health Summer Institute at Columbia University


Analysis of Complex Survey Data




CLASS SESSIONS

Monday, June 13, 2011 – Friday, June 17, 2011
1 :00PM to 5 :00PM

INSTRUCTOR
Katherine M. Keyes, PhD MPH
Columbia University Epidemiology Merit Fellow
(212) 543-5002
Kmk2104@columbia.edu
722 West 168th Street, 2nd Floor, Suite 229C

COURSE DESCRIPTION
This course will provide participants with practical skills to analyze data arising from complex epidemiologic sampling designs. Complex survey data violate typical assumptions about simple random samples of independent observations, thus requiring specialized statistical techniques. National Health and Nutrition Examination Survey (NHANES) data will be used for applied demonstrations, illustrating concepts applicable to all data sets arising from complex survey designs. We will discuss the theory behind complex sampling strategies and the necessity of applying appropriate statistical techniques to analyze these data and make valid inferences. We will demonstrate the appropriate use of sampling weights in the NHANES data and how the appropriate weight is specific to the research question being asked. We will demonstrate how to obtain basic descriptive statistics, appropriate variance estimates, regression parameters, and survival analysis output in SAS and SAS-callable SUDAAN software.
COURSE LEARNING OBJECTIVES
1. Understand the theoretical justifications for complex survey designs, and why special statistical procedures are needed.
2. Learn how to find publicly available data online, download and organize datasets, and manipulate data for analysis
3. Use SUDAAN software to analyze complex survey data, and be able to do basic frequency and crosstab analysis, linear and logistic regression, and Cox proportional hazards models.

RECOMMENDED COURSE READING LIST

The recommended textbook is:
Research Triangle Intitute (2008). SUDAAN Language Manual, Release 10.0. Research Triangle Park, NC: Research Triangle Institute.



COURSE STRUCTURE
Class time is 20 hours total. The structure of each class will be:
1:00-2:00 Lecture
2:00-3:30 Guided exercise
3:30-3:45 Break
3:45-5:00 Independent research project

Each class is divided into three parts:

Lecture.Dr. Keyes will give an overview of the topic of the day, review basic statistical theory and practice, and provide an overview into the use of SUDAAN software for the topics of the day.

Guided exercise.Dr. Keyes will provide example code for the topics of the day and an in-class assignment sheet. Students will run the code provided an answer the in-class assignment questions. We will discuss the answers as a group.

Independent research project. On the first day of class, students will identify a research question of interest in the NHANES data. By the end of the week, students will have completed an analysis of this research question on their own using the tools discussed in lecture.

COURSE SCHEDULE

Session 1 – Introduction to complex survey data
6-13-11
Lecture:
Discuss the structure of common complex sampling designs
Discuss finding publicly available data, downloading and organizing data
Merging data into SAS, manage and manipulate study variables

In-Class Assignment: Locate variables, download data, append and merge data, identify, recode, and evaluate missing data in SAS

Independent practice: Select a research question in NHANES that will guide your independent practice throughout the week. Locate variables, download data, append and merge data, identify, recode, and evaluate missing data in SAS.

Required reading:
NHANES general data release document: http://www.cdc.gov/nchs/data/nhanes/nhanes_05_06/general_data_release_doc_05_06.pdf

Session 1 Files
Here is the course syllabus:



Here are the power point slides for today's class:


Here is the word document that guides you through extracting files from the NHANES website:

Here is an excel spreadsheet to use as the basis of your data dictionary:

Here is the SAS program file that we will use for the lab today:


Session 2 – Univariate statistics
6-14-11
Lecture:
Introduction to SUDAAN software basic language
Create macros in SAS to output large quantities of univariate statistics
Review application of basic statistical tests (what tests, when, why, assumptions)

In-Class Assignment: Practice the CROSSTAB, RATIO, and DESCRIPT procedures in SUDAAN to do univariate analysis and bivariate statistics including chi-square and t-test

Independent practice: Students will create a table of univariate and bivariate statistics associated with their research question on interest, categorize variables appropriately, use chi-square and t-tests. Also, extract and code potential confounders such as sex, race, age, and income.

Session 2 Files

Here are the power point slides for today's class:



Here is the SAS program file for lab:



Here is the dataset that we created yesterday:


Session 3 – Regression
6-15-11
Lecture:
Introduction to linear, logistic, Poisson, and polytomous regression procedures in SUDAAN
Review application of regression procedures (what procedure, when, why, assumptions)
Discuss variable categorization (when are categories necessary, how should you categorize your variables)

In-Class Assignment: REGRESS, RLOGIST, MULTILOG, and LOGLINK procedures in SUDAAN

Independent practice: Students will conduct and interpret regression procedures related to their research question of interest.


Session 3 Files

Here are the power point slides for today's class:


Here is the SAS program file for today's lab
:

Session 4 – Survival analysis
6-16-11
Lecture:
Introduction to survival analysis and Cox proportional hazards models in SUDAAN
Discuss differences between Kaplan-Meier and life table
Discuss assumptions of Cox proportional hazards models

In-Class Assignment: KAPMEIER, SURVIVAL procedures in SUDAAN

Independent practice: Students will conduct and interpret survival analyses related to their research question of interest.

Session 4 Files

Here are the power point slides for today's class:

Here is the SAS program file for today's lab:


Here is an excel file with a formatted kaplan meier curve


Here is the lab from yesterday's class with code for predicted and conditional marginals added


Here is the SAS PROC SURVEY code


Session 5 – Special procedures and options
6-17-11
Lecture:
Discuss data imputation: theories and assumptions
Discuss weight development

In-Class Assignment: HOTDECK for imputation and WTADJUST for creating weights

Independent practice: Students will finish their independent research project, practice imputing data for at least one of their variables of interest


Session 5 Files

Slides


Code for imputation lab


Code for weighting lab