This is making less and less sense to me as time goes on.

(1) The ranks of the small sample within the combined sample
have *integer* values and will not be distributed (under the null
hypothesis or otherwise) according to  a *continuous* uniform
distribution.  Hence applying qnorm() makes no sense.

(2) Why attempt to transform to normality anyway?  Just deal directly
with those ranks, which I guess would have (under the null hypothesis)
a discrete uniform distribution on {1, 2, ..., m+n} where m and n are
the sizes of the two samples.

(3) Using ranks as you do sounds to me like re-inventing some form
of non-parametric test of equality of distributions.

(4) I doubt me an you will get much if any more power from such
rank  based tests than you would from the KS test.

(5) If your test, whatever it is, lacks the power to detect the fact
that the two samples are from different distributions, then almost
surely any analysis that you do which is based on the assumption
that the two distributions are the same will be as "correct" as it
can possibly be.  If the data do not contain information which distinguishes
the two distributions, then you might as well analyse the data as if
there is only one distribution.  If the information content ain't there,
it ain't there.

(6) What *practical* knowledge about a real phenomenon would be
revealed if a test rejected the hypothesis that the distributions underlying
the two samples were equal?

    cheers,

        Rolf Turner

On 12/11/12 15:12, Herschtal Alan wrote:
Thanks for your response. The background is that I am trying to test
whether a small sample and a much larger sample actually came from the
same distribution. I could just perform a KS test on the 2 samples, but
as I said, ideally I'd like a test that is more powerful than that. So I
look at the percentile ranks of the small sample within the large
sample, which should be uniformly distributed if the 2 samples are from
the same population, and then transform using "qnorm". The result should
be standard normal. Perhaps the next best alternative is to do
chi-square test on the percentiles, checking for equal numbers in each
decile bin. This would certainly work, and the only disadvantage that I
can see is that the selection of the bin boundaries is somewhat
arbitrary.

Alan Herschtal
Senior Biostatistician
Peter MacCallum Cancer Centre

Phone +61 3 9656 3639
Fax +61 3 9656 1420
Email alan.hersch...@petermac.org

-----Original Message-----
From: Rolf Turner [mailto:rolf.tur...@xtra.co.nz]
Sent: Friday, 9 November 2012 2:17 PM
To: Herschtal Alan
Cc: r-help@r-project.org
Subject: Re: [R] Looking for a test of standard normality


Others may correct me, but I cannot imagine any test of standard
normality
giving appreciably more power than is given by the Kolmogorov-Smirnov
test.

I also wonder about the point of testing for (standard) normality in the
first place.  There is a quote --- I think it refers to testing for
heteroscedasticity,
but I believe it applies equally to testing for normality  --- about
such testing
being analogous to going out of the harbour in a rowing dinghy to see if

it's
safe for an ocean liner to put to sea.

      cheers,

          Rolf Turner

On 09/11/12 13:23, Herschtal Alan wrote:
Dear list members,

I am looking for a goodness of test that will tell me if a sample is
likely to have come from a standard normal distribution. I can find
plenty of omnibus tests for normality in the nor.test package, but
none
of them appear to allow me to test against the specific alternative
that
the data are not standard normal. My back up option is to use a
Kolmogorov-Smirnov test, but my impression is that that is not a very
powerful test. Any suggestions?

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to