On Sat, Aug 11, 2012 at 5:53 AM, Anthony Damico <ajdam...@gmail.com> wrote: > Hi everyone, my apologies in advance if I'm overlooking something simple in > this question. I am trying to use R's survey package to make a direct > method age-adjustment to some complex survey data. I have played with > postStratify, calibrate, rake, and simply multiplying the base weights by > the correct proportions - nothing seems to hit the published numbers on the > nose. <snip> > # but matching the figure exactly requires an exact age adjustment. > > # create the population types vector > pop.types <- > data.frame( > agecat = 0:3 , > Freq = c( 55901 , 77670 , 72816 , 45364 ) > ) > > > z.postStratified <- postStratify( z , ~agecat , pop.types , partial = T )
The standardization in the CDC examples is within each subpopulation. That is, they standardise each race/ethnicity group to the Census age structure, rather than standardising the whole population. That's the whole point -- they want to look at an imaginary population where age and race aren't confounded. When I do this, it almost exactly matches. The next step was to drop all the missing data and reweight just the non-missing data. That works exactly. (I also think you have the wrong recoding of RIDRETH1). demog<-read.xport("~/Downloads/demo_f.xpt") chol<-read.xport("~/Downloads/TCHOL_f.xpt") alldata<-merge(demog,chol) alldata<-subset(alldata, RIDSTATR %in% 2) alldata<-transform(alldata, HI_CHOL = ifelse(LBXTC>=240,1,0)) alldata<-transform(alldata, race=c(1,1,2,3,4)[RIDRETH1]) alldata<-transform(alldata, agecat=cut(RIDAGEYR,c(0,19,39,59, Inf))) popage<-c(55901,77670,72816,45364) racegender<-as.data.frame(svytable(~race+RIAGENDR,design)) racegenderage<-expand.grid(race=1:4,RIAGENDR=1:2,agecat=levels(alldata$agecat)) racegenderage$Freq<- as.vector(outer(racegender$Freq, popage/sum(popage))) design <- svydesign(id=~SDMVPSU, strata=~SDMVSTRA,nest=TRUE,weights=~WTMEC2YR,data=alldata) svyby(~HI_CHOL,~race+RIAGENDR,design=subset(postStratify(design,~race+RIAGENDR+agecat,racegenderage),RIDAGEYR>=20),svymean,na.rm=TRUE) somedata<-subset(alldata, !is.na(LBXTC)) design1 <- svydesign(id=~SDMVPSU, strata=~SDMVSTRA,nest=TRUE,weights=~WTMEC2YR,data=somedata) svyby(~HI_CHOL,~race+RIAGENDR,design=subset(postStratify(design1,~race+RIAGENDR+agecat,racegenderage),RIDAGEYR>=20),svymean,na.rm=TRUE) -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.