thanks a lot, Jiefei ! and thanks to all for your time and comments ! have a good weekend !
On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwj...@gmail.com> wrote: > Hi Bogdan, > > I think the journal is asking about the exact value of the pvalue, it > doesn't matter if it is from the exact distribution or normal > approximation. However, it does not make any sense to report such a small > pvlaue. If I was you, I would show the reviewers the exact pvalue they want > and gently explain why you did not put it into your paper. If they insist > that the number must be on the paper, then go ahead and do it. > > Best, > Jiefei > > > > Bogdan Tanasa <tan...@gmail.com> 于 2021年3月20日周六 上午2:39写道: > >> Thank you Kevin, their wording is "Please note that the exact p value >> should be provided, when possible, etc" >> >> by "exact p-value" i believe that they do mean indeed the actual number, >> and not to specify "exact=TRUE" ; >> >> as we are working with 1000 genes, shall i specify "exact=TRUE" on my PC, >> it runs out of memory ... >> >> wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value >> >> On Fri, Mar 19, 2021 at 11:10 AM Kevin Thorpe <kevin.tho...@utoronto.ca> >> wrote: >> >> > I have to ask since. Are you sure the journal simply means by exact >> > p-value that they don’t want to see a p-value given as < 0.0001, for >> > example, and simply want the actual number? >> > >> > I cannot imagine they really meant exact as in the p-value from some >> exact >> > distribution. >> > >> > -- >> > Kevin E. Thorpe >> > Head of Biostatistics, Applied Health Research Centre (AHRC) >> > Li Ka Shing Knowledge Institute of St. Michael's >> > Assistant Professor, Dalla Lana School of Public Health >> > University of Toronto >> > email: kevin.tho...@utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 >> > >> > > On Mar 19, 2021, at 1:22 PM, Bogdan Tanasa <tan...@gmail.com> wrote: >> > > >> > > EXTERNAL EMAIL: >> > > >> > > Dear all, thank you all for comments and help. >> > > >> > > as far as i can see, shall we have samples of 1000 records, only >> > > "exact=FALSE" allows the code to run: >> > > >> > > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value >> > > [1] 7.304863e-231 >> > > >> > > shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC : >> > > >> > > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value >> > > (the job is terminated by OS) >> > > >> > > shall you have any other suggestions, please let me know. thanks a >> lot ! >> > > >> > > On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter <bgunter.4...@gmail.com> >> > wrote: >> > > >> > >> I **believe** -- if my old memory still serves-- that the "exact" >> > >> specification uses a home grown version of the algorithm to calculate >> > >> exact, or close approximations to the exact, permutation >> distribution >> > >> originally developed by Cyrus Mehta, founder of StatXact software. >> Of >> > >> course, examining the C code source would determine this, but I don't >> > care >> > >> to attempt this. >> > >> >> > >> If this is (no longer?) correct, please point this out. >> > >> >> > >> Best, >> > >> >> > >> Bert Gunter >> > >> >> > >> "The trouble with having an open mind is that people keep coming >> along >> > and >> > >> sticking things into it." >> > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> > >> >> > >> >> > >> On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang <szwj...@gmail.com> >> wrote: >> > >> >> > >>> Hi Spencer, >> > >>> >> > >>> Thanks for your test results, I do not know the answer as I haven't >> > >>> used wilcox.test for many years. I do not know if it is possible to >> > >>> compute >> > >>> the exact distribution of the Wilcoxon rank sum statistic, but I >> think >> > it >> > >>> is very likely, as the document of `Wilcoxon` says: >> > >>> >> > >>> This distribution is obtained as follows. Let x and y be two random, >> > >>> independent samples of size m and n. Then the Wilcoxon rank sum >> > statistic >> > >>> is the number of all pairs (x[i], y[j]) for which y[j] is not >> greater >> > than >> > >>> x[i]. This statistic takes values between 0 and m * n, and its mean >> and >> > >>> variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively. >> > >>> >> > >>> As a nice feature of the non-parametric statistic, it is usually >> > >>> distribution-free so you can pick any distribution you like to >> compute >> > the >> > >>> same statistic. I wonder if this is the case, but I might be wrong. >> > >>> >> > >>> Cheers, >> > >>> Jiefei >> > >>> >> > >>> >> > >>> On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves < >> > >>> spencer.gra...@effectivedefense.org> wrote: >> > >>> >> > >>>> >> > >>>> >> > >>>> On 2021-3-19 9:52 AM, Jiefei Wang wrote: >> > >>>>> After digging into the R source, it turns out that the argument >> > >>> `exact` >> > >>>> has >> > >>>>> nothing to do with the numeric precision. It only affects the >> > >>> statistic >> > >>>>> model used to compute the p-value. When `exact=TRUE` the true >> > >>>> distribution >> > >>>>> of the statistic will be used. Otherwise, a normal approximation >> will >> > >>> be >> > >>>>> used. >> > >>>>> >> > >>>>> I think the documentation needs to be improved here, you can >> compute >> > >>> the >> > >>>>> exact p-value *only* when you do not have any ties in your data. >> If >> > >>> you >> > >>>>> have ties in your data you will get the p-value from the normal >> > >>>>> approximation no matter what value you put in `exact`. This >> behavior >> > >>>> should >> > >>>>> be documented or a warning should be given when `exact=TRUE` and >> ties >> > >>>>> present. >> > >>>>> >> > >>>>> FYI, if the exact p-value is required, `pwilcox` function will be >> > >>> used to >> > >>>>> compute the p-value. There are no details on how it computes the >> > >>> pvalue >> > >>>> but >> > >>>>> its C code seems to compute the probability table, so I assume it >> > >>>> computes >> > >>>>> the exact p-value from the true distribution of the statistic, >> not a >> > >>>>> permutation or MC p-value. >> > >>>> >> > >>>> >> > >>>> My example shows that it does NOT use Monte Carlo, because >> > >>>> otherwise it uses some distribution. I believe the term "exact" >> means >> > >>>> that it uses the permutation distribution, though I could be >> mistaken. >> > >>>> If it's NOT a permutation distribution, I don't know what it is. >> > >>>> >> > >>>> >> > >>>> Spencer >> > >>>>> >> > >>>>> Best, >> > >>>>> Jiefei >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwj...@gmail.com> >> > >>> wrote: >> > >>>>> >> > >>>>>> Hey, >> > >>>>>> >> > >>>>>> I just want to point out that the word "exact" has two meanings. >> It >> > >>> can >> > >>>>>> mean the numerically accurate p-value as Bogdan asked in his >> first >> > >>>> email, >> > >>>>>> or it could mean the p-value calculated from the exact >> distribution >> > >>> of >> > >>>> the >> > >>>>>> statistic(In this case, U stat). These two are actually not >> related, >> > >>>> even >> > >>>>>> though they all called "exact". >> > >>>>>> >> > >>>>>> Best, >> > >>>>>> Jiefei >> > >>>>>> >> > >>>>>> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < >> > >>>>>> spencer.gra...@effectivedefense.org> wrote: >> > >>>>>> >> > >>>>>>> >> > >>>>>>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote: >> > >>>>>>>> thanks a lot, Vivek ! in other words, assuming that we work >> with >> > >>> 1000 >> > >>>>>>> data >> > >>>>>>>> points, >> > >>>>>>>> >> > >>>>>>>> shall we use EXACT = TRUE, it uses the normal approximation, >> > >>>>>>>> >> > >>>>>>>> while if EXACT=FALSE (for these large samples), it does not ? >> > >>>>>>> >> > >>>>>>> As David Winsemius noted, the documentation is not clear. >> > >>>>>>> Consider the following: >> > >>>>>>> >> > >>>>>>>> set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > > >> > >>> wilcox.test(x, >> > >>>>>>> y)$p.value >> > >>>>>>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >> > >> > >>>>>>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > >> > >>> wilcox.test(x, >> > >>>>>>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, >> > >>>>>>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y, >> > >>>>>>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y, >> > >>>>>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, >> > >>>>>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, >> > >>>>>>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, >> > >>>>>>> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here: >> > >>>>>>> 1.172189e-25 and 4.123875e-32. The first one, I think, is the >> > normal >> > >>>>>>> approximation, which is the same as exact=FALSE. I think that >> with >> > >>>>>>> exact=FALSE, you get a permutation distribution, though I'm not >> > >>> sure. >> > >>>>>>> You might try looking at "wilcox_test in package coin for exact, >> > >>>>>>> asymptotic and Monte Carlo conditional p-values, including in >> the >> > >>>>>>> presence of ties" to see if it is clearer. NOTE: R is case >> > >>> sensitive, >> > >>>> so >> > >>>>>>> "EXACT" is a different variable from "exact". It is interpreted >> as >> > >>> an >> > >>>>>>> optional argument, which is not recognized and therefore >> ignored in >> > >>>> this >> > >>>>>>> context. >> > >>>>>>> Hope this helps. >> > >>>>>>> Spencer >> > >>>>>>> >> > >>>>>>> >> > >>>>>>>> On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mm...@gmail.com >> > >> > >>>> wrote: >> > >>>>>>>> >> > >>>>>>>>> Hi Bogdan, >> > >>>>>>>>> >> > >>>>>>>>> You can also get the information from the link of the >> Wilcox.test >> > >>>>>>> function >> > >>>>>>>>> page. >> > >>>>>>>>> >> > >>>>>>>>> “By default (if exact is not specified), an exact p-value is >> > >>> computed >> > >>>>>>> if >> > >>>>>>>>> the samples contain less than 50 finite values and there are >> no >> > >>> ties. >> > >>>>>>>>> Otherwise, a normal approximation is used.” >> > >>>>>>>>> >> > >>>>>>>>> For more: >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>> >> > >>>> >> > >>> >> > >> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html >> > >>>>>>>>> Hope this helps! >> > >>>>>>>>> >> > >>>>>>>>> Best, >> > >>>>>>>>> >> > >>>>>>>>> VD >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa < >> tan...@gmail.com >> > > >> > >>>>>>> wrote: >> > >>>>>>>>>> Dear Peter, thanks a lot. yes, we can see a very precise >> > p-value, >> > >>>> and >> > >>>>>>> that >> > >>>>>>>>>> was the request from the journal. >> > >>>>>>>>>> >> > >>>>>>>>>> if I may ask another question please : what is the meaning of >> > >>>>>>> "exact=TRUE" >> > >>>>>>>>>> or "exact=FALSE" in wilcox.test ? >> > >>>>>>>>>> >> > >>>>>>>>>> i can see that the "numerically precise" p-values are >> different. >> > >>>>>>> thanks a >> > >>>>>>>>>> lot ! >> > >>>>>>>>>> >> > >>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) >> > >>>>>>>>>> tst$p.value >> > >>>>>>>>>> [1] 8.535524e-25 >> > >>>>>>>>>> >> > >>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) >> > >>>>>>>>>> tst$p.value >> > >>>>>>>>>> [1] 3.448211e-25 >> > >>>>>>>>>> >> > >>>>>>>>>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < >> > >>>>>>>>>> peter.langfel...@gmail.com> wrote: >> > >>>>>>>>>> >> > >>>>>>>>>>> I thinnk the answer is much simpler. The print method for >> > >>>> hypothesis >> > >>>>>>>>>>> tests (class htest) truncates the p-values. In the above >> > >>> example, >> > >>>>>>>>>>> instead of using >> > >>>>>>>>>>> >> > >>>>>>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) >> > >>>>>>>>>>> >> > >>>>>>>>>>> and copying the output, just print the p-value: >> > >>>>>>>>>>> >> > >>>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) >> > >>>>>>>>>>> tst$p.value >> > >>>>>>>>>>> >> > >>>>>>>>>>> [1] 2.988368e-32 >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>> I think this value is what the journal asks for. >> > >>>>>>>>>>> >> > >>>>>>>>>>> HTH, >> > >>>>>>>>>>> >> > >>>>>>>>>>> Peter >> > >>>>>>>>>>> >> > >>>>>>>>>>> On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves >> > >>>>>>>>>>> <spencer.gra...@effectivedefense.org> wrote: >> > >>>>>>>>>>>> I would push back on that from two perspectives: >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> 1. I would study exactly what the journal >> said >> > >>>> very >> > >>>>>>>>>>>> carefully. If they mandated "wilcox.test", that function >> has >> > >>> an >> > >>>>>>>>>>>> argument called "exact". If that's what they are asking, >> then >> > >>>> using >> > >>>>>>>>>>>> that argument gives the exact p-value, e.g.: >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> Wilcoxon rank sum exact test >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> data: rnorm(100) and rnorm(100, 2) >> > >>>>>>>>>>>> W = 691, p-value < 2.2e-16 >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> 2. If that's NOT what they are asking, then >> I'm >> > >>>> not >> > >>>>>>>>>>>> convinced what they are asking makes sense: There is is no >> > >>> such >> > >>>>>>> thing >> > >>>>>>>>>>>> as an "exact p value" except to the extent that certain >> > >>>> assumptions >> > >>>>>>>>>>>> hold, and all models are wrong (but some are useful), as >> > George >> > >>>> Box >> > >>>>>>>>>>>> famously said years ago.[1] Truth only exists in >> mathematics, >> > >>> and >> > >>>>>>>>>>>> that's because it's a fiction to start with ;-) >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> Hope this helps. >> > >>>>>>>>>>>> Spencer Graves >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> [1] >> > >>>>>>>>>>>> https://en.wikipedia.org/wiki/All_models_are_wrong >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> On 2021-3-18 11:12 PM, Bogdan Tanasa wrote: >> > >>>>>>>>>>>>> < >> > >>>>>>> >> > >>>> >> > https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16 >> > >>>>>>>>>>>>> Dear all, >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> i would appreciate having your advice on the following >> please >> > >>> : >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> in R, the wilcox.test() provides "a p-value < 2.2e-16", >> when >> > >>> we >> > >>>>>>>>>> compare >> > >>>>>>>>>>>>> sets of 1000 genes expression (in the genomics field). >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> however, the journal asks us to provide the exact p value >> ... >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> would it be legitimate to write : "p-value = 0" ? thanks a >> > >>> lot, >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> -- bogdan >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> [[alternative HTML version deleted]] >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> ______________________________________________ >> > >>>>>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and >> > more, >> > >>>> see >> > >>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>>>>>>>>>>>> PLEASE do read the posting guide >> > >>>>>>>>>>> http://www.R-project.org/posting-guide.html >> > >>>>>>>>>>>>> and provide commented, minimal, self-contained, >> reproducible >> > >>>> code. >> > >>>>>>>>>>>> ______________________________________________ >> > >>>>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and >> more, >> > >>> see >> > >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>>>>>>>>>>> PLEASE do read the posting guide >> > >>>>>>>>>>> http://www.R-project.org/posting-guide.html >> > >>>>>>>>>>>> and provide commented, minimal, self-contained, >> reproducible >> > >>> code. >> > >>>>>>>>>> [[alternative HTML version deleted]] >> > >>>>>>>>>> >> > >>>>>>>>>> ______________________________________________ >> > >>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and >> more, >> > >>> see >> > >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>>>>>>>>> PLEASE do read the posting guide >> > >>>>>>>>>> http://www.R-project.org/posting-guide.html >> > >>>>>>>>>> and provide commented, minimal, self-contained, reproducible >> > >>> code. >> > >>>>>>>>>> >> > >>>>>>>>> -- >> > >>>>>>>>> ---------------------------------------------------------- >> > >>>>>>>>> >> > >>>>>>>>> Vivek Das, PhD >> > >>>>>>>>> >> > >>>>>>>> [[alternative HTML version deleted]] >> > >>>>>>>> >> > >>>>>>>> ______________________________________________ >> > >>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >> see >> > >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>>>>>>> PLEASE do read the posting guide >> > >>>>>>> http://www.R-project.org/posting-guide.html >> > >>>>>>>> and provide commented, minimal, self-contained, reproducible >> code. >> > >>>>>>> >> > >>>>>>> [[alternative HTML version deleted]] >> > >>>>>>> >> > >>>>>>> ______________________________________________ >> > >>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >> see >> > >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>>>>>> PLEASE do read the posting guide >> > >>>>>>> http://www.R-project.org/posting-guide.html >> > >>>>>>> and provide commented, minimal, self-contained, reproducible >> code. >> > >>>>>>> >> > >>>> >> > >>>> >> > >>> >> > >>> [[alternative HTML version deleted]] >> > >>> >> > >>> ______________________________________________ >> > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > >>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>> PLEASE do read the posting guide >> > >>> http://www.R-project.org/posting-guide.html >> > >>> and provide commented, minimal, self-contained, reproducible code. >> > >>> >> > >> >> > > >> > > [[alternative HTML version deleted]] >> > > >> > > ______________________________________________ >> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > > https://stat.ethz.ch/mailman/listinfo/r-help >> > > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > > and provide commented, minimal, self-contained, reproducible code. >> > >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.