I am late to this discussion -- I read R-help as a once-a-day summary.  A few 
comments.

1. In the gene-discovery subfield of statistics (SNP studies, etc.)  there is a 
huge 
multiple-testing problem.  In defense, the field thinks in terms of thresholds 
like 1e-5 
or 1e-10 rather than the .05 or .01 most of us are used to.   In that 
literature, they do 
care about  1e-16 vs 1e-20.    We can all argue about whether that is a 
sensible approach 
or not, but it is what it is.  I think that this is the context of the 
journal's request, 
i.e., they want the actual number, however you calculate it.

My own opinion is that these rarified p-values are an arbitrary scale, one that 
no longer 
has a probability interpretation.   For the central limit theorem to be correct 
that far 
from the mean requires a sample size that is beyond imagination  (`number of 
atoms in the 
earth' order of size).   Such a scale may still be useful, but it's not really 
a probability.

2. The label of "Fisher's exact test" has caused decades of confusion.  In this 
context 
the word means "a particular test whose distribution can be completely 
enumerated": it 
does not mean either "correct" or "precise".  The original enumeration methods 
had 
limitations with resspect to the sample size or the presence of complications 
such as tied 
values;  from the discussion so far it would appear that the 'exact' argument 
of 
wilcox.test uses such a method.   Cyrus Mehta did nice work on improved 
algorithms that do 
not have these restrictions, methods that have been refiined and expanded in 
the software 
offerings from Cytel among others. Perhaps someone could update R's code to use 
this, but 
see 3 below.

My own opinion is that permutation tests are an important tool, one "wrench" in 
our 
statistical toolbox.   But they are only one tool out of many.  I am quite put 
off by 
arguments that purposefully conflate "exact" and "correct".

3. The concordance statistic C, the Wilcoxon test, and Somer's d are all the 
same 
statistic, just written a little differently. (Somer's d is essentially 
Kendalls' tau, but 
with a slightly different rule for ties).  A test for C=.5 is the same as a 
Wilcoxon.  For 
a binary response C = the area under the reciever operating curve (AUC).   The 
concordance 
command in the surivival library computes this statistic for continuous, 
binary, or 
censored responses.    The variance is based on a jackknife argument, and is 
computed by 
organizing the data into a binary tree structure, very similar to the methods 
used by 
Mehta, is efficient for large n and is valid for ties.   Perhaps add a link in 
the  
wilcox.test help page?

Footnote: AUC is a special case of C but not vice versa.  People sometimes try 
to extend 
AUC to the other data types, but IMHO with only moderate success.

-- 
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to