In the first and third examples it looks as though confint(svymean()) matches the totals and svyciprop(method="logit") matches the proportions, which is what Stata says (http://www.stata.com/statalist/archive/2006-10/msg01127.html). The agreement isn't perfect for the counts in the first example, but since Stata's default numeric type is single-precision, it's accurate enough.
In the second example the issue is that svyciprop(method="logit") doesn't take the design degrees of freedom into account, and Stata does. m<-svyglm(I(val==1)~1,family=quasibinomial,design=y) expit<-function(eta) exp(eta)/(1+exp(eta)) expit(coef(m)+qt(0.975,99)*SE(m)) (Intercept) 0.2971778536 expit(coef(m)-qt(0.975,99)*SE(m)) (Intercept) 0.1287770109 I'll add the df= argument to svyciprop(method="logit"), and then it should all match. -thomas On Tue, Sep 25, 2012 at 5:56 AM, Anthony Damico <ajdam...@gmail.com> wrote: > Hi Dr. Lumley, you're obviously correct about all of that. Thank you for > cluing me into it! And sorry for overlooking that part of the > documentation. > > I'm unfortunately still struggling with matching numbers exactly, and I > foolishly provided a dataset without a weight variable - thinking there was > only one issue preventing R from matching Stata precisely. Adding the df= > to the confint() call seems to have only solved half of the problem. > > If you (or anyone else) still has any energy to look at this issue, I've > pasted three analysis examples with both Stata and R code that aim to > calculate a survey-adjusted confidence interval. > > All three examples match coefficient and standard error values precisely, > for both weighted counts and percents. In the first example, the confidence > intervals don't match for either the counts or the percents. In the second, > the CIs match for counts but not percents. In the third, CIs match for > both. Because of this erratic behavior on relatively straightforward > datasets, I'm worried that Stata is making some weird (non-reproducible) > calculations that would obviously be outside of the scope of this list. > > Thanks!! > > > # # # # # # # # > # example 1 # > # # # # # # # # > > # simple example of confidence intervals not matching between stata and R > # this example doesn't match CIs for counts or percents > > options( digits = 10 ) > library(foreign) > library(survey) > x <- read.dta( > "http://www.ats.ucla.edu/stat/stata/seminars/svy_stata_intro/apipop.dta" ) > x$pw <- 6194/310 > y <- svydesign( ~1 , data = x , weights = ~pw ) > > svytotal( ~factor( stype ) , y ) > confint( svytotal( ~factor( stype ) , y ) , df = degf( y ) ) > > svymean( ~factor( stype ) , y ) > confint( svymean( ~factor( stype ) , y ) , df = degf( y ) ) > > > > > > use http://www.ats.ucla.edu/stat/stata/seminars/svy_stata_intro/apipop, > clear > gen pw = 6194/310 > svyset [pweight=pw] > svy: tabulate stype, count se ci format (%12.3f) > svy: tabulate stype, percent se ci format (%12.3f) > > > ---------------------------------------------------------- > stype | count se lb ub > ----------+----------------------------------------------- > E | 88334.428 710.843 86940.929 89727.927 > H | 15085.386 514.508 14076.773 16094.000 > M | 20340.296 582.814 19197.778 21482.814 > > > -------------------------------------------------------------- > stype | percentages se lb ub > ----------+--------------------------------------------------- > E | 71.376 0.574 70.236 72.488 > H | 12.189 0.416 11.397 13.028 > M | 16.435 0.471 15.533 17.379 > > > > > # # # # # # # # > # example 2 # > # # # # # # # # > > > > # simple example of confidence intervals not matching between stata and R > # this example does match CIs for counts but does not match for percents > > options(digits=10) > library(foreign) > library(survey) > > # create a sample data frame with 100 records, containing values 1 - 5 and > weights 1 or 2 > x <- data.frame( case = 1:100 , val = 1:5 , weight = 1:2 ) > # immediately save it externally to be read into stata for comparison > write.dta( x , "c:/my directory/hundred record example.dta" ) > > # create a simple survey design, with weights only > y <- svydesign( ~1 , data = x , weights = ~weight ) > > # the confidence interval for the weighted counts do match stata > svytotal( ~I( val == 1 ) , y ) > confint( svytotal( ~I( val == 1 ) , y ) , df = degf( y ) ) > > # generate the mean & standard error -- these two match precisely.. > svymean( ~I( val == 1 ) , y ) > > # # # # # # # # # # # > > # ..but none of the confidence interval attempts match > confint( svymean( ~I( val == 1 ) , y ) ) > confint( svymean( ~I( val == 1 ) , y ) , df = degf( y ) ) > svyciprop(~I( val == 1 ) , y, method="li" , df = degf( y ) ) > svyciprop(~I( val == 1 ) , y, method="lo", df = degf( y ) ) > svyciprop(~I( val == 1 ) , y, method="as", df = degf( y ) ) > svyciprop(~I( val == 1 ) , y, method="be", df = degf( y ) ) > > # # # # # # # # # # # > > > ************************************** > > * run the R code first to create this dta file > use c:\my directory\hundred record example.dta , clear > svyset [pweight=weight] > svy: tabulate val, count se ci > svy: tabulate val, percent se ci > > Number of strata = 1 Number of obs = > 100 > Number of PSUs = 100 Population size = > 150 > Design df = > 99 > > ---------------------------------------------------------- > val | count se lb ub > ----------+----------------------------------------------- > 1 | 30 6.435 17.23 42.77 > 2 | 30 6.435 17.23 42.77 > 3 | 30 6.435 17.23 42.77 > 4 | 30 6.435 17.23 42.77 > 5 | 30 6.435 17.23 42.77 > > > Number of strata = 1 Number of obs = > 100 > Number of PSUs = 100 Population size = > 150 > Design df = > 99 > > -------------------------------------------------------------- > val | percentages se lb ub > ----------+--------------------------------------------------- > 1 | 20 4.238 12.88 29.72 > 2 | 20 4.238 12.88 29.72 > 3 | 20 4.238 12.88 29.72 > 4 | 20 4.238 12.88 29.72 > 5 | 20 4.238 12.88 29.72 > > > > > > # # # # # # # # > # example 3 # > # # # # # # # # > > > # this example matches CIs for both > > library(foreign) > library(survey) > > x <- read.dta( "http://www.stata-press.com/data/r11/nhanes2f.dta" ) > y <- svydesign( ~1 , data = x , weights = ~finalwgt ) > svymean( ~sex , y ) > confint( svymean( ~sex , y ) , df = degf( y ) ) > > mean SE > sexMale 0.47958 0.0059 > sexFemale 0.52042 0.0059 > 2.5 % 97.5 % > sexMale 0.4681058 0.4910512 > sexFemale 0.5089488 0.5318942 > > ************************************** > > use http://www.stata-press.com/data/r11/nhanes2f, clear > svyset [pweight=finalwgt] > svy: tabulate sex, percent se ci > (running tabulate on estimation sample) > > Number of strata = 1 Number of obs = > 10337 > Number of PSUs = 10337 Population size = > 117023659 > Design df = > 10336 > > -------------------------------------------------------------- > 1=male, | > 2=female | percentages se lb ub > ----------+--------------------------------------------------- > Male | 47.96 .5853 46.81 49.11 > Female | 52.04 .5853 50.89 53.19 > > > > > > > On Sun, Sep 23, 2012 at 4:30 PM, Thomas Lumley <tlum...@uw.edu> wrote: >> >> On Sat, Sep 22, 2012 at 2:51 AM, Anthony Damico <ajdam...@gmail.com> >> wrote: >> >> > Survey: Mean estimation >> > >> > Number of strata = 1 Number of obs = 183 >> > Number of PSUs = 15 Population size = 9235.4 >> > Design df = 14 >> > >> > -------------------------------------------------------------- >> > | Linearized >> > | Mean Std. Err. [95% Conf. Interval] >> > -------------+------------------------------------------------ >> > ell0 | .0218579 .0226225 -.0266624 .0703783 >> > -------------------------------------------------------------- >> >> This matches >> >> > svyciprop(~I(ell==0),dclus1,df=14,method="mean") >> 2.5% 97.5% >> I(ell == 0) 0.0218579 -0.0266624 0.07038 >> >> as does this >> >> > confint(svymean(~I(ell==0),dclus1),df=14) >> 2.5 % 97.5 % >> I(ell == 0)FALSE 0.92962174505 1.02666240796 >> I(ell == 0)TRUE -0.02666240796 0.07037825495 >> >> The df= argument is not explicitly documented in ?svyciprop, but it is >> in ?confint.svystat and ?svymean >> >> [I was slowed down a bit by the claim that the Stata intervals were >> asymmetric, but in fact they aren't] >> >> -thomas >> >> -- >> Thomas Lumley >> Professor of Biostatistics >> University of Auckland > > -- Thomas Lumley Professor of Biostatistics University of Auckland ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.