Thank you, Stas. This is helpful. A few thoughts. 1) In this linearization, I do treat N (population) size as a known constant. I thought that is what svymean() and SAS proc surveymeans did as well. So, this is a simple univariate expansion since I only take the derivative w.r.t to Y, the population total. 2) Yes, the cluster sizes do vary. I meant to mention this. But, I wasn't sure if this was an issue or not. You can see in my first example I add in the comment that the data are balanced. That is because I created a second example (but didn't include it in this email) where I created an unbalanced data set where the cluster sizes vary. But, my code and svymeans() gave the exact same output when I ran it on the unbalanced cases as well. 3) There are no weights with these data. The data I am working with are test scores from a state. Students are clustered within schools. Entire schools were chosen to participate in the assessment. 4) I was thinking the finite population correction would not be needed in this case, but maybe I am wrong. But if I did add in the finite population correction, that would affect the variance of the total and I would get a different estimate than what svymeans or SAS proc means gives and that doesn't occur. As it stands now, my code, and the built in functions return the same variance of the total.
-----Original Message----- From: Stas Kolenikov [mailto:[EMAIL PROTECTED] Sent: Fri 8/15/2008 3:31 PM To: Doran, Harold Cc: r-help@r-project.org Subject: Re: [R] Design-consistent variance estimate Harold, in design-based estimation, thinking in terms of "what is my (effective) sample size" rarely works out. First of all, unless you have a fixed sample size design, your sample size itself is a random variable. You can hope for fixed sample sizes with some excruciatingly controlled clinical studies, but with most other surveys, you are at the mercy of non-response, unknown cluster sizes, interviewer availability, all sorts of field problems. So in ratio estimation (and estimation of the mean is ratio estimation, mean[Y] = total[Y]/total[1]), your standard error should control for randomness in the sample size, so your Taylor series linearization formula should have the variance of the denominator, and then also correlation between cluster totals of Y's and 1's. Second, you probably have different cluster/PSU sizes. That's actually what contributes to variability of total[1]. But at any rate that variability invalidates simple formulae for balanced PSU sizes that your code is using. Third, at least theoretically, there might be finite population corrections, although you don't seem to specify any in your svydesign definition. And frankly I've never seen a survey where weights were not needed. If you want to take a look at some references, Korn & Graubard 1999 (http://www.citeulike.org/user/ctacmo/article/553280) might be a good starting point, they have a pretty thorough discussion of issues with variance estimation in cluster samples. A more technical reading is Thompson 1997 (http://www.citeulike.org/user/ctacmo/article/1036973). On 8/15/08, Doran, Harold <[EMAIL PROTECTED]> wrote: > Dear List: > > I am working to understand some differences between the results of the > svymean() function in the survey package and from code I have written > myself. The results from svymean() also agree with results I get from > SAS proc surveymeans, so, this suggests I am misunderstanding something. > -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.