Thank you Kevin, their wording is "Please note that the exact p value should be provided, when possible, etc"
by "exact p-value" i believe that they do mean indeed the actual number, and not to specify "exact=TRUE" ; as we are working with 1000 genes, shall i specify "exact=TRUE" on my PC, it runs out of memory ... wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value On Fri, Mar 19, 2021 at 11:10 AM Kevin Thorpe <kevin.tho...@utoronto.ca> wrote: > I have to ask since. Are you sure the journal simply means by exact > p-value that they don’t want to see a p-value given as < 0.0001, for > example, and simply want the actual number? > > I cannot imagine they really meant exact as in the p-value from some exact > distribution. > > -- > Kevin E. Thorpe > Head of Biostatistics, Applied Health Research Centre (AHRC) > Li Ka Shing Knowledge Institute of St. Michael's > Assistant Professor, Dalla Lana School of Public Health > University of Toronto > email: kevin.tho...@utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 > > > On Mar 19, 2021, at 1:22 PM, Bogdan Tanasa <tan...@gmail.com> wrote: > > > > EXTERNAL EMAIL: > > > > Dear all, thank you all for comments and help. > > > > as far as i can see, shall we have samples of 1000 records, only > > "exact=FALSE" allows the code to run: > > > > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value > > [1] 7.304863e-231 > > > > shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC : > > > > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value > > (the job is terminated by OS) > > > > shall you have any other suggestions, please let me know. thanks a lot ! > > > > On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter <bgunter.4...@gmail.com> > wrote: > > > >> I **believe** -- if my old memory still serves-- that the "exact" > >> specification uses a home grown version of the algorithm to calculate > >> exact, or close approximations to the exact, permutation distribution > >> originally developed by Cyrus Mehta, founder of StatXact software. Of > >> course, examining the C code source would determine this, but I don't > care > >> to attempt this. > >> > >> If this is (no longer?) correct, please point this out. > >> > >> Best, > >> > >> Bert Gunter > >> > >> "The trouble with having an open mind is that people keep coming along > and > >> sticking things into it." > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > >> > >> On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang <szwj...@gmail.com> wrote: > >> > >>> Hi Spencer, > >>> > >>> Thanks for your test results, I do not know the answer as I haven't > >>> used wilcox.test for many years. I do not know if it is possible to > >>> compute > >>> the exact distribution of the Wilcoxon rank sum statistic, but I think > it > >>> is very likely, as the document of `Wilcoxon` says: > >>> > >>> This distribution is obtained as follows. Let x and y be two random, > >>> independent samples of size m and n. Then the Wilcoxon rank sum > statistic > >>> is the number of all pairs (x[i], y[j]) for which y[j] is not greater > than > >>> x[i]. This statistic takes values between 0 and m * n, and its mean and > >>> variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively. > >>> > >>> As a nice feature of the non-parametric statistic, it is usually > >>> distribution-free so you can pick any distribution you like to compute > the > >>> same statistic. I wonder if this is the case, but I might be wrong. > >>> > >>> Cheers, > >>> Jiefei > >>> > >>> > >>> On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves < > >>> spencer.gra...@effectivedefense.org> wrote: > >>> > >>>> > >>>> > >>>> On 2021-3-19 9:52 AM, Jiefei Wang wrote: > >>>>> After digging into the R source, it turns out that the argument > >>> `exact` > >>>> has > >>>>> nothing to do with the numeric precision. It only affects the > >>> statistic > >>>>> model used to compute the p-value. When `exact=TRUE` the true > >>>> distribution > >>>>> of the statistic will be used. Otherwise, a normal approximation will > >>> be > >>>>> used. > >>>>> > >>>>> I think the documentation needs to be improved here, you can compute > >>> the > >>>>> exact p-value *only* when you do not have any ties in your data. If > >>> you > >>>>> have ties in your data you will get the p-value from the normal > >>>>> approximation no matter what value you put in `exact`. This behavior > >>>> should > >>>>> be documented or a warning should be given when `exact=TRUE` and ties > >>>>> present. > >>>>> > >>>>> FYI, if the exact p-value is required, `pwilcox` function will be > >>> used to > >>>>> compute the p-value. There are no details on how it computes the > >>> pvalue > >>>> but > >>>>> its C code seems to compute the probability table, so I assume it > >>>> computes > >>>>> the exact p-value from the true distribution of the statistic, not a > >>>>> permutation or MC p-value. > >>>> > >>>> > >>>> My example shows that it does NOT use Monte Carlo, because > >>>> otherwise it uses some distribution. I believe the term "exact" means > >>>> that it uses the permutation distribution, though I could be mistaken. > >>>> If it's NOT a permutation distribution, I don't know what it is. > >>>> > >>>> > >>>> Spencer > >>>>> > >>>>> Best, > >>>>> Jiefei > >>>>> > >>>>> > >>>>> > >>>>> On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwj...@gmail.com> > >>> wrote: > >>>>> > >>>>>> Hey, > >>>>>> > >>>>>> I just want to point out that the word "exact" has two meanings. It > >>> can > >>>>>> mean the numerically accurate p-value as Bogdan asked in his first > >>>> email, > >>>>>> or it could mean the p-value calculated from the exact distribution > >>> of > >>>> the > >>>>>> statistic(In this case, U stat). These two are actually not related, > >>>> even > >>>>>> though they all called "exact". > >>>>>> > >>>>>> Best, > >>>>>> Jiefei > >>>>>> > >>>>>> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < > >>>>>> spencer.gra...@effectivedefense.org> wrote: > >>>>>> > >>>>>>> > >>>>>>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote: > >>>>>>>> thanks a lot, Vivek ! in other words, assuming that we work with > >>> 1000 > >>>>>>> data > >>>>>>>> points, > >>>>>>>> > >>>>>>>> shall we use EXACT = TRUE, it uses the normal approximation, > >>>>>>>> > >>>>>>>> while if EXACT=FALSE (for these large samples), it does not ? > >>>>>>> > >>>>>>> As David Winsemius noted, the documentation is not clear. > >>>>>>> Consider the following: > >>>>>>> > >>>>>>>> set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > > > >>> wilcox.test(x, > >>>>>>> y)$p.value > >>>>>>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > > > >>>>>>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > > >>> wilcox.test(x, > >>>>>>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, > >>>>>>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y, > >>>>>>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y, > >>>>>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, > >>>>>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, > >>>>>>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, > >>>>>>> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here: > >>>>>>> 1.172189e-25 and 4.123875e-32. The first one, I think, is the > normal > >>>>>>> approximation, which is the same as exact=FALSE. I think that with > >>>>>>> exact=FALSE, you get a permutation distribution, though I'm not > >>> sure. > >>>>>>> You might try looking at "wilcox_test in package coin for exact, > >>>>>>> asymptotic and Monte Carlo conditional p-values, including in the > >>>>>>> presence of ties" to see if it is clearer. NOTE: R is case > >>> sensitive, > >>>> so > >>>>>>> "EXACT" is a different variable from "exact". It is interpreted as > >>> an > >>>>>>> optional argument, which is not recognized and therefore ignored in > >>>> this > >>>>>>> context. > >>>>>>> Hope this helps. > >>>>>>> Spencer > >>>>>>> > >>>>>>> > >>>>>>>> On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mm...@gmail.com> > >>>> wrote: > >>>>>>>> > >>>>>>>>> Hi Bogdan, > >>>>>>>>> > >>>>>>>>> You can also get the information from the link of the Wilcox.test > >>>>>>> function > >>>>>>>>> page. > >>>>>>>>> > >>>>>>>>> “By default (if exact is not specified), an exact p-value is > >>> computed > >>>>>>> if > >>>>>>>>> the samples contain less than 50 finite values and there are no > >>> ties. > >>>>>>>>> Otherwise, a normal approximation is used.” > >>>>>>>>> > >>>>>>>>> For more: > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>> > >>> > https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html > >>>>>>>>> Hope this helps! > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> > >>>>>>>>> VD > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tan...@gmail.com > > > >>>>>>> wrote: > >>>>>>>>>> Dear Peter, thanks a lot. yes, we can see a very precise > p-value, > >>>> and > >>>>>>> that > >>>>>>>>>> was the request from the journal. > >>>>>>>>>> > >>>>>>>>>> if I may ask another question please : what is the meaning of > >>>>>>> "exact=TRUE" > >>>>>>>>>> or "exact=FALSE" in wilcox.test ? > >>>>>>>>>> > >>>>>>>>>> i can see that the "numerically precise" p-values are different. > >>>>>>> thanks a > >>>>>>>>>> lot ! > >>>>>>>>>> > >>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) > >>>>>>>>>> tst$p.value > >>>>>>>>>> [1] 8.535524e-25 > >>>>>>>>>> > >>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) > >>>>>>>>>> tst$p.value > >>>>>>>>>> [1] 3.448211e-25 > >>>>>>>>>> > >>>>>>>>>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < > >>>>>>>>>> peter.langfel...@gmail.com> wrote: > >>>>>>>>>> > >>>>>>>>>>> I thinnk the answer is much simpler. The print method for > >>>> hypothesis > >>>>>>>>>>> tests (class htest) truncates the p-values. In the above > >>> example, > >>>>>>>>>>> instead of using > >>>>>>>>>>> > >>>>>>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) > >>>>>>>>>>> > >>>>>>>>>>> and copying the output, just print the p-value: > >>>>>>>>>>> > >>>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) > >>>>>>>>>>> tst$p.value > >>>>>>>>>>> > >>>>>>>>>>> [1] 2.988368e-32 > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> I think this value is what the journal asks for. > >>>>>>>>>>> > >>>>>>>>>>> HTH, > >>>>>>>>>>> > >>>>>>>>>>> Peter > >>>>>>>>>>> > >>>>>>>>>>> On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves > >>>>>>>>>>> <spencer.gra...@effectivedefense.org> wrote: > >>>>>>>>>>>> I would push back on that from two perspectives: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> 1. I would study exactly what the journal said > >>>> very > >>>>>>>>>>>> carefully. If they mandated "wilcox.test", that function has > >>> an > >>>>>>>>>>>> argument called "exact". If that's what they are asking, then > >>>> using > >>>>>>>>>>>> that argument gives the exact p-value, e.g.: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) > >>>>>>>>>>>> > >>>>>>>>>>>> Wilcoxon rank sum exact test > >>>>>>>>>>>> > >>>>>>>>>>>> data: rnorm(100) and rnorm(100, 2) > >>>>>>>>>>>> W = 691, p-value < 2.2e-16 > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> 2. If that's NOT what they are asking, then I'm > >>>> not > >>>>>>>>>>>> convinced what they are asking makes sense: There is is no > >>> such > >>>>>>> thing > >>>>>>>>>>>> as an "exact p value" except to the extent that certain > >>>> assumptions > >>>>>>>>>>>> hold, and all models are wrong (but some are useful), as > George > >>>> Box > >>>>>>>>>>>> famously said years ago.[1] Truth only exists in mathematics, > >>> and > >>>>>>>>>>>> that's because it's a fiction to start with ;-) > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Hope this helps. > >>>>>>>>>>>> Spencer Graves > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> [1] > >>>>>>>>>>>> https://en.wikipedia.org/wiki/All_models_are_wrong > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On 2021-3-18 11:12 PM, Bogdan Tanasa wrote: > >>>>>>>>>>>>> < > >>>>>>> > >>>> > https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16 > >>>>>>>>>>>>> Dear all, > >>>>>>>>>>>>> > >>>>>>>>>>>>> i would appreciate having your advice on the following please > >>> : > >>>>>>>>>>>>> > >>>>>>>>>>>>> in R, the wilcox.test() provides "a p-value < 2.2e-16", when > >>> we > >>>>>>>>>> compare > >>>>>>>>>>>>> sets of 1000 genes expression (in the genomics field). > >>>>>>>>>>>>> > >>>>>>>>>>>>> however, the journal asks us to provide the exact p value ... > >>>>>>>>>>>>> > >>>>>>>>>>>>> would it be legitimate to write : "p-value = 0" ? thanks a > >>> lot, > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- bogdan > >>>>>>>>>>>>> > >>>>>>>>>>>>> [[alternative HTML version deleted]] > >>>>>>>>>>>>> > >>>>>>>>>>>>> ______________________________________________ > >>>>>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and > more, > >>>> see > >>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>>>>>>>>> PLEASE do read the posting guide > >>>>>>>>>>> http://www.R-project.org/posting-guide.html > >>>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible > >>>> code. > >>>>>>>>>>>> ______________________________________________ > >>>>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > >>> see > >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>>>>>>>> PLEASE do read the posting guide > >>>>>>>>>>> http://www.R-project.org/posting-guide.html > >>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible > >>> code. > >>>>>>>>>> [[alternative HTML version deleted]] > >>>>>>>>>> > >>>>>>>>>> ______________________________________________ > >>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > >>> see > >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>>>>>> PLEASE do read the posting guide > >>>>>>>>>> http://www.R-project.org/posting-guide.html > >>>>>>>>>> and provide commented, minimal, self-contained, reproducible > >>> code. > >>>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> ---------------------------------------------------------- > >>>>>>>>> > >>>>>>>>> Vivek Das, PhD > >>>>>>>>> > >>>>>>>> [[alternative HTML version deleted]] > >>>>>>>> > >>>>>>>> ______________________________________________ > >>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>>>> PLEASE do read the posting guide > >>>>>>> http://www.R-project.org/posting-guide.html > >>>>>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>>>> > >>>>>>> [[alternative HTML version deleted]] > >>>>>>> > >>>>>>> ______________________________________________ > >>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>>> PLEASE do read the posting guide > >>>>>>> http://www.R-project.org/posting-guide.html > >>>>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>>>> > >>>> > >>>> > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.