I am late to this discussion -- I read R-help as a once-a-day summary. A few comments.
1. In the gene-discovery subfield of statistics (SNP studies, etc.) there is a huge multiple-testing problem. In defense, the field thinks in terms of thresholds like 1e-5 or 1e-10 rather than the .05 or .01 most of us are used to. In that literature, they do care about 1e-16 vs 1e-20. We can all argue about whether that is a sensible approach or not, but it is what it is. I think that this is the context of the journal's request, i.e., they want the actual number, however you calculate it. My own opinion is that these rarified p-values are an arbitrary scale, one that no longer has a probability interpretation. For the central limit theorem to be correct that far from the mean requires a sample size that is beyond imagination (`number of atoms in the earth' order of size). Such a scale may still be useful, but it's not really a probability. 2. The label of "Fisher's exact test" has caused decades of confusion. In this context the word means "a particular test whose distribution can be completely enumerated": it does not mean either "correct" or "precise". The original enumeration methods had limitations with resspect to the sample size or the presence of complications such as tied values; from the discussion so far it would appear that the 'exact' argument of wilcox.test uses such a method. Cyrus Mehta did nice work on improved algorithms that do not have these restrictions, methods that have been refiined and expanded in the software offerings from Cytel among others. Perhaps someone could update R's code to use this, but see 3 below. My own opinion is that permutation tests are an important tool, one "wrench" in our statistical toolbox. But they are only one tool out of many. I am quite put off by arguments that purposefully conflate "exact" and "correct". 3. The concordance statistic C, the Wilcoxon test, and Somer's d are all the same statistic, just written a little differently. (Somer's d is essentially Kendalls' tau, but with a slightly different rule for ties). A test for C=.5 is the same as a Wilcoxon. For a binary response C = the area under the reciever operating curve (AUC). The concordance command in the surivival library computes this statistic for continuous, binary, or censored responses. The variance is based on a jackknife argument, and is computed by organizing the data into a binary tree structure, very similar to the methods used by Mehta, is efficient for large n and is valid for ties. Perhaps add a link in the wilcox.test help page? Footnote: AUC is a special case of C but not vice versa. People sometimes try to extend AUC to the other data types, but IMHO with only moderate success. -- Terry M Therneau, PhD Department of Health Science Research Mayo Clinic thern...@mayo.edu "TERR-ree THUR-noh" [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.