On 10/16/2008 11:43 AM, Greg Snow wrote:
I wonder if including the p-values for the normality test is the best approach in you 
animation?  The clt does not say that the distribution of the means will be normal, just 
that it approaches normality (and therefore may be a decent approximation).  The 
normality test can just reject the null that the data (simulated means) comes from a 
normal distribution.  Since the true distribution of the means is not normal (unless you 
use a sample size of Inf, and I for one have better things to than wait for a computer to 
simulate several samples of size Inf) the null for the normality test is always false and 
therefore the test will always result in either saying it is not normal or a type II 
error.  The real goal is not to show normality, but to show that using the normal gives a 
"good enough" approximation.  I would prefer the bottom plot to show either the 
proportion of p-values from a normal based test on the simulated data that is less than 
alpha, or the proportion of confid
ence intervals based on the normal based test that include the true parameter.  
Then the user can see when those values become close enough an approximation.

But the p-value is not the test. The test comes later, when you interpret the p-value. So there's no such thing as a Type II error in a p-value. The demo does show that for n < 20 (or whatever), the test is very likely to reject the null. After that, it becomes less and less likely.

My suggestion (and this is a matter of taste) would be to do the tests independently, rather than using the same dataset plus new observations each time. It is hard to understand the behaviour of p-values even without complicating things by giving a correlated sequence of them.

And this is even more a matter of taste: I'd plot the p-values as points, not as vertical bars. Showing that a p-value of 0.8 is twice as big as a p-value of 0.4 isn't useful for interpreting them.

Duncan Murdoch



What is your target audience for this demo?  In my opinion, anyone who could 
understand the bottom plot should already understand the clt enough not to need 
the demo, those that I would aim the demo at would just be confused by the 
current bottom plot.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
project.org] On Behalf Of Yihui Xie
Sent: Wednesday, October 15, 2008 10:51 PM
To: roger koenker
Cc: r-help
Subject: Re: [R] plot - central limit theorem

Thanks, Roger, your demo is interesting. I'm thinking about improving
it later.

I've also made a demo for the CLT in my package 'animation', in which
there's also normality testing for the sample means, because I don't
think "bell-shaped" alone means normality - so I performed the
Shapiro-Wilk test and plotted the P-values under the demo. See the
function clt.ani() in the package 'animation', or
http://animation.yihui.name/prob:central_limit_theorem

You can use any function to denote the population (specify the
argument 'FUN') in clt.ani().

Regards,
Yihui
--
Yihui Xie <[EMAIL PROTECTED]>
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China



On Thu, Oct 16, 2008 at 4:22 AM, roger koenker <[EMAIL PROTECTED]>
wrote:
> Galton's 19th century mechanical version of this is the quincunx.  I
have a
> (very primitive) version of this for R at:
>
>
http://www.econ.uiuc.edu/~roger/courses/476/routines/quincunx.R
>
>
> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> email    [EMAIL PROTECTED]            Department of Economics
> vox:     217-333-4558                University of Illinois
> fax:       217-244-6678                Champaign, IL 61820
>
>
>
>> Jörg Groß wrote:
>>>
>>> Hi,
>>>
>>>
>>> Is there a way to simulate a population with R and pull out m
samples,
>>> each with n values
>>> for calculating m means?
>>>
>>> I need that kind of data to plot a graphic, demonstrating the
central
>>> limit theorem
>>> and I don't know how to begin.
>>>
>>> So, perhaps someone can give me some tips and hints how to start
and
>>> which functions to use.
>>>
>>>
>>>
>>> thanks for any help,
>>> joerg
>>>
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to