Dear all, thank you all for comments and help. as far as i can see, shall we have samples of 1000 records, only "exact=FALSE" allows the code to run:
wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value [1] 7.304863e-231 shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC : wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value (the job is terminated by OS) shall you have any other suggestions, please let me know. thanks a lot ! On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter <bgunter.4...@gmail.com> wrote: > I **believe** -- if my old memory still serves-- that the "exact" > specification uses a home grown version of the algorithm to calculate > exact, or close approximations to the exact, permutation distribution > originally developed by Cyrus Mehta, founder of StatXact software. Of > course, examining the C code source would determine this, but I don't care > to attempt this. > > If this is (no longer?) correct, please point this out. > > Best, > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang <szwj...@gmail.com> wrote: > >> Hi Spencer, >> >> Thanks for your test results, I do not know the answer as I haven't >> used wilcox.test for many years. I do not know if it is possible to >> compute >> the exact distribution of the Wilcoxon rank sum statistic, but I think it >> is very likely, as the document of `Wilcoxon` says: >> >> This distribution is obtained as follows. Let x and y be two random, >> independent samples of size m and n. Then the Wilcoxon rank sum statistic >> is the number of all pairs (x[i], y[j]) for which y[j] is not greater than >> x[i]. This statistic takes values between 0 and m * n, and its mean and >> variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively. >> >> As a nice feature of the non-parametric statistic, it is usually >> distribution-free so you can pick any distribution you like to compute the >> same statistic. I wonder if this is the case, but I might be wrong. >> >> Cheers, >> Jiefei >> >> >> On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves < >> spencer.gra...@effectivedefense.org> wrote: >> >> > >> > >> > On 2021-3-19 9:52 AM, Jiefei Wang wrote: >> > > After digging into the R source, it turns out that the argument >> `exact` >> > has >> > > nothing to do with the numeric precision. It only affects the >> statistic >> > > model used to compute the p-value. When `exact=TRUE` the true >> > distribution >> > > of the statistic will be used. Otherwise, a normal approximation will >> be >> > > used. >> > > >> > > I think the documentation needs to be improved here, you can compute >> the >> > > exact p-value *only* when you do not have any ties in your data. If >> you >> > > have ties in your data you will get the p-value from the normal >> > > approximation no matter what value you put in `exact`. This behavior >> > should >> > > be documented or a warning should be given when `exact=TRUE` and ties >> > > present. >> > > >> > > FYI, if the exact p-value is required, `pwilcox` function will be >> used to >> > > compute the p-value. There are no details on how it computes the >> pvalue >> > but >> > > its C code seems to compute the probability table, so I assume it >> > computes >> > > the exact p-value from the true distribution of the statistic, not a >> > > permutation or MC p-value. >> > >> > >> > My example shows that it does NOT use Monte Carlo, because >> > otherwise it uses some distribution. I believe the term "exact" means >> > that it uses the permutation distribution, though I could be mistaken. >> > If it's NOT a permutation distribution, I don't know what it is. >> > >> > >> > Spencer >> > > >> > > Best, >> > > Jiefei >> > > >> > > >> > > >> > > On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwj...@gmail.com> >> wrote: >> > > >> > >> Hey, >> > >> >> > >> I just want to point out that the word "exact" has two meanings. It >> can >> > >> mean the numerically accurate p-value as Bogdan asked in his first >> > email, >> > >> or it could mean the p-value calculated from the exact distribution >> of >> > the >> > >> statistic(In this case, U stat). These two are actually not related, >> > even >> > >> though they all called "exact". >> > >> >> > >> Best, >> > >> Jiefei >> > >> >> > >> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < >> > >> spencer.gra...@effectivedefense.org> wrote: >> > >> >> > >>> >> > >>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote: >> > >>>> thanks a lot, Vivek ! in other words, assuming that we work with >> 1000 >> > >>> data >> > >>>> points, >> > >>>> >> > >>>> shall we use EXACT = TRUE, it uses the normal approximation, >> > >>>> >> > >>>> while if EXACT=FALSE (for these large samples), it does not ? >> > >>> >> > >>> As David Winsemius noted, the documentation is not clear. >> > >>> Consider the following: >> > >>> >> > >>>> set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > > >> wilcox.test(x, >> > >>> y)$p.value >> > >>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > > >> > >>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > >> wilcox.test(x, >> > >>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, >> > >>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y, >> > >>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y, >> > >>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, >> > >>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, >> > >>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, >> > >>> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here: >> > >>> 1.172189e-25 and 4.123875e-32. The first one, I think, is the normal >> > >>> approximation, which is the same as exact=FALSE. I think that with >> > >>> exact=FALSE, you get a permutation distribution, though I'm not >> sure. >> > >>> You might try looking at "wilcox_test in package coin for exact, >> > >>> asymptotic and Monte Carlo conditional p-values, including in the >> > >>> presence of ties" to see if it is clearer. NOTE: R is case >> sensitive, >> > so >> > >>> "EXACT" is a different variable from "exact". It is interpreted as >> an >> > >>> optional argument, which is not recognized and therefore ignored in >> > this >> > >>> context. >> > >>> Hope this helps. >> > >>> Spencer >> > >>> >> > >>> >> > >>>> On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mm...@gmail.com> >> > wrote: >> > >>>> >> > >>>>> Hi Bogdan, >> > >>>>> >> > >>>>> You can also get the information from the link of the Wilcox.test >> > >>> function >> > >>>>> page. >> > >>>>> >> > >>>>> “By default (if exact is not specified), an exact p-value is >> computed >> > >>> if >> > >>>>> the samples contain less than 50 finite values and there are no >> ties. >> > >>>>> Otherwise, a normal approximation is used.” >> > >>>>> >> > >>>>> For more: >> > >>>>> >> > >>>>> >> > >>> >> > >> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html >> > >>>>> Hope this helps! >> > >>>>> >> > >>>>> Best, >> > >>>>> >> > >>>>> VD >> > >>>>> >> > >>>>> >> > >>>>> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tan...@gmail.com> >> > >>> wrote: >> > >>>>>> Dear Peter, thanks a lot. yes, we can see a very precise p-value, >> > and >> > >>> that >> > >>>>>> was the request from the journal. >> > >>>>>> >> > >>>>>> if I may ask another question please : what is the meaning of >> > >>> "exact=TRUE" >> > >>>>>> or "exact=FALSE" in wilcox.test ? >> > >>>>>> >> > >>>>>> i can see that the "numerically precise" p-values are different. >> > >>> thanks a >> > >>>>>> lot ! >> > >>>>>> >> > >>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) >> > >>>>>> tst$p.value >> > >>>>>> [1] 8.535524e-25 >> > >>>>>> >> > >>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) >> > >>>>>> tst$p.value >> > >>>>>> [1] 3.448211e-25 >> > >>>>>> >> > >>>>>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < >> > >>>>>> peter.langfel...@gmail.com> wrote: >> > >>>>>> >> > >>>>>>> I thinnk the answer is much simpler. The print method for >> > hypothesis >> > >>>>>>> tests (class htest) truncates the p-values. In the above >> example, >> > >>>>>>> instead of using >> > >>>>>>> >> > >>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) >> > >>>>>>> >> > >>>>>>> and copying the output, just print the p-value: >> > >>>>>>> >> > >>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) >> > >>>>>>> tst$p.value >> > >>>>>>> >> > >>>>>>> [1] 2.988368e-32 >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> I think this value is what the journal asks for. >> > >>>>>>> >> > >>>>>>> HTH, >> > >>>>>>> >> > >>>>>>> Peter >> > >>>>>>> >> > >>>>>>> On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves >> > >>>>>>> <spencer.gra...@effectivedefense.org> wrote: >> > >>>>>>>> I would push back on that from two perspectives: >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> 1. I would study exactly what the journal said >> > very >> > >>>>>>>> carefully. If they mandated "wilcox.test", that function has >> an >> > >>>>>>>> argument called "exact". If that's what they are asking, then >> > using >> > >>>>>>>> that argument gives the exact p-value, e.g.: >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> > wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) >> > >>>>>>>> >> > >>>>>>>> Wilcoxon rank sum exact test >> > >>>>>>>> >> > >>>>>>>> data: rnorm(100) and rnorm(100, 2) >> > >>>>>>>> W = 691, p-value < 2.2e-16 >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> 2. If that's NOT what they are asking, then I'm >> > not >> > >>>>>>>> convinced what they are asking makes sense: There is is no >> such >> > >>> thing >> > >>>>>>>> as an "exact p value" except to the extent that certain >> > assumptions >> > >>>>>>>> hold, and all models are wrong (but some are useful), as George >> > Box >> > >>>>>>>> famously said years ago.[1] Truth only exists in mathematics, >> and >> > >>>>>>>> that's because it's a fiction to start with ;-) >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> Hope this helps. >> > >>>>>>>> Spencer Graves >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> [1] >> > >>>>>>>> https://en.wikipedia.org/wiki/All_models_are_wrong >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> On 2021-3-18 11:12 PM, Bogdan Tanasa wrote: >> > >>>>>>>>> < >> > >>> >> > https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16 >> > >>>>>>>>> Dear all, >> > >>>>>>>>> >> > >>>>>>>>> i would appreciate having your advice on the following please >> : >> > >>>>>>>>> >> > >>>>>>>>> in R, the wilcox.test() provides "a p-value < 2.2e-16", when >> we >> > >>>>>> compare >> > >>>>>>>>> sets of 1000 genes expression (in the genomics field). >> > >>>>>>>>> >> > >>>>>>>>> however, the journal asks us to provide the exact p value ... >> > >>>>>>>>> >> > >>>>>>>>> would it be legitimate to write : "p-value = 0" ? thanks a >> lot, >> > >>>>>>>>> >> > >>>>>>>>> -- bogdan >> > >>>>>>>>> >> > >>>>>>>>> [[alternative HTML version deleted]] >> > >>>>>>>>> >> > >>>>>>>>> ______________________________________________ >> > >>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >> > see >> > >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>>>>>>>> PLEASE do read the posting guide >> > >>>>>>> http://www.R-project.org/posting-guide.html >> > >>>>>>>>> and provide commented, minimal, self-contained, reproducible >> > code. >> > >>>>>>>> ______________________________________________ >> > >>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >> see >> > >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>>>>>>> PLEASE do read the posting guide >> > >>>>>>> http://www.R-project.org/posting-guide.html >> > >>>>>>>> and provide commented, minimal, self-contained, reproducible >> code. >> > >>>>>> [[alternative HTML version deleted]] >> > >>>>>> >> > >>>>>> ______________________________________________ >> > >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >> see >> > >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>>>>> PLEASE do read the posting guide >> > >>>>>> http://www.R-project.org/posting-guide.html >> > >>>>>> and provide commented, minimal, self-contained, reproducible >> code. >> > >>>>>> >> > >>>>> -- >> > >>>>> ---------------------------------------------------------- >> > >>>>> >> > >>>>> Vivek Das, PhD >> > >>>>> >> > >>>> [[alternative HTML version deleted]] >> > >>>> >> > >>>> ______________________________________________ >> > >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > >>>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>>> PLEASE do read the posting guide >> > >>> http://www.R-project.org/posting-guide.html >> > >>>> and provide commented, minimal, self-contained, reproducible code. >> > >>> >> > >>> [[alternative HTML version deleted]] >> > >>> >> > >>> ______________________________________________ >> > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > >>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>> PLEASE do read the posting guide >> > >>> http://www.R-project.org/posting-guide.html >> > >>> and provide commented, minimal, self-contained, reproducible code. >> > >>> >> > >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.