Thank you, Stas. This is helpful. A few thoughts.

1) In this linearization, I do treat N (population) size as a known constant. I 
thought that is what svymean() and SAS proc surveymeans did as well. So, this 
is a simple univariate expansion since I only take the derivative w.r.t to Y, 
the population total.
2) Yes, the cluster sizes do vary. I meant to mention this. But, I wasn't sure 
if this was an issue or not. You can see in my first example I add in the 
comment that the data are balanced. That is because I created a second example 
(but didn't include it in this email) where I created an unbalanced data set 
where the cluster sizes vary. But, my code and svymeans() gave the exact same 
output when I ran it on the unbalanced cases as well. 
3) There are no weights with these data. The data I am working with are test 
scores from a state. Students are clustered within schools. Entire schools were 
chosen to participate in the assessment.
4) I was thinking the finite population correction would not be needed in this 
case, but maybe I am wrong. But if I did add in the finite population 
correction, that would affect the variance of the total and I would get a 
different estimate than what svymeans or SAS proc means gives and that doesn't 
occur. As it stands now, my code, and the built in functions return the same 
variance of the total.


-----Original Message-----
From: Stas Kolenikov [mailto:[EMAIL PROTECTED]
Sent: Fri 8/15/2008 3:31 PM
To: Doran, Harold
Cc: r-help@r-project.org
Subject: Re: [R] Design-consistent variance estimate
 
Harold,

in design-based estimation, thinking in terms of "what is my
(effective) sample size" rarely works out.

First of all, unless you have a fixed sample size design, your sample
size itself is a random variable. You can hope for fixed sample sizes
with some excruciatingly controlled clinical studies, but with most
other surveys, you are at the mercy of non-response, unknown cluster
sizes, interviewer availability, all sorts of field problems. So in
ratio estimation (and estimation of the mean is ratio estimation,
mean[Y] = total[Y]/total[1]), your standard error should control for
randomness in the sample size, so your Taylor series linearization
formula should have the variance of the denominator, and then also
correlation between cluster totals of Y's and 1's.

Second, you probably have different cluster/PSU sizes. That's actually
what contributes to variability of total[1]. But at any rate that
variability invalidates simple formulae for balanced PSU sizes that
your code is using.

Third, at least theoretically, there might be finite population
corrections, although you don't seem to specify any in your svydesign
definition. And frankly I've never seen a survey where weights were
not needed.

If you want to take a look at some references, Korn & Graubard 1999
(http://www.citeulike.org/user/ctacmo/article/553280) might be a good
starting point, they have a pretty thorough discussion of issues with
variance estimation in cluster samples. A more technical reading is
Thompson 1997 (http://www.citeulike.org/user/ctacmo/article/1036973).

On 8/15/08, Doran, Harold <[EMAIL PROTECTED]> wrote:
> Dear List:
>
>  I am working to understand some differences between the results of the
>  svymean() function in the survey package and from code I have written
>  myself. The results from svymean() also agree with results I get from
>  SAS proc surveymeans, so, this suggests I am misunderstanding something.
>

-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to