Re: [R] about a p-value < 2.2e-16

Bogdan Tanasa Fri, 19 Mar 2021 22:13:04 -0700

thanks a lot, Jiefei ! and thanks to all for your time and comments !

have a good weekend !





On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwj...@gmail.com> wrote:

> Hi Bogdan,
>
> I think the journal is asking about the exact value of the pvalue, it
> doesn't matter if it is from the exact distribution or normal
> approximation. However, it does not make any sense to report such a small
> pvlaue. If I was you, I would show the reviewers the exact pvalue they want
> and gently explain why you did not put it into your paper. If they insist
> that the number must be on the paper, then go ahead and do it.
>
> Best,
> Jiefei
>
>
>
> Bogdan Tanasa <tan...@gmail.com> 于 2021年3月20日周六 上午2:39写道：
>
>> Thank you Kevin, their wording is "Please note that the exact p value
>> should be provided, when possible, etc"
>>
>> by "exact p-value" i believe that they do mean indeed the actual number,
>> and not to specify "exact=TRUE" ;
>>
>> as we are working with 1000 genes, shall i specify "exact=TRUE" on my PC,
>> it runs out of memory ...
>>
>> wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value
>>
>> On Fri, Mar 19, 2021 at 11:10 AM Kevin Thorpe <kevin.tho...@utoronto.ca>
>> wrote:
>>
>> > I have to ask since. Are you sure the journal simply means by exact
>> > p-value that they don’t want to see a p-value given as < 0.0001, for
>> > example, and simply want the actual number?
>> >
>> > I cannot imagine they really meant exact as in the p-value from some
>> exact
>> > distribution.
>> >
>> > --
>> > Kevin E. Thorpe
>> > Head of Biostatistics,  Applied Health Research Centre (AHRC)
>> > Li Ka Shing Knowledge Institute of St. Michael's
>> > Assistant Professor, Dalla Lana School of Public Health
>> > University of Toronto
>> > email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016
>> >
>> > > On Mar 19, 2021, at 1:22 PM, Bogdan Tanasa <tan...@gmail.com> wrote:
>> > >
>> > > EXTERNAL EMAIL:
>> > >
>> > > Dear all, thank you all for comments and help.
>> > >
>> > > as far as i can see, shall we have samples of 1000 records, only
>> > > "exact=FALSE" allows the code to run:
>> > >
>> > > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value
>> > > [1] 7.304863e-231
>> > >
>> > > shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC :
>> > >
>> > > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value
>> > > (the job is terminated by OS)
>> > >
>> > > shall you have any other suggestions, please let me know. thanks a
>> lot !
>> > >
>> > > On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter <bgunter.4...@gmail.com>
>> > wrote:
>> > >
>> > >> I **believe** -- if my old memory still serves-- that the "exact"
>> > >> specification uses a home grown version of the algorithm to calculate
>> > >> exact,  or close approximations to the exact, permutation
>> distribution
>> > >> originally developed by Cyrus Mehta, founder of StatXact software.
>> Of
>> > >> course, examining the C code source would determine this, but I don't
>> > care
>> > >> to attempt this.
>> > >>
>> > >> If this is (no longer?) correct, please point this out.
>> > >>
>> > >> Best,
>> > >>
>> > >> Bert Gunter
>> > >>
>> > >> "The trouble with having an open mind is that people keep coming
>> along
>> > and
>> > >> sticking things into it."
>> > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> > >>
>> > >>
>> > >> On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang <szwj...@gmail.com>
>> wrote:
>> > >>
>> > >>> Hi Spencer,
>> > >>>
>> > >>> Thanks for your test results, I do not know the answer as I haven't
>> > >>> used wilcox.test for many years. I do not know if it is possible to
>> > >>> compute
>> > >>> the exact distribution of the Wilcoxon rank sum statistic, but I
>> think
>> > it
>> > >>> is very likely, as the document of `Wilcoxon` says:
>> > >>>
>> > >>> This distribution is obtained as follows. Let x and y be two random,
>> > >>> independent samples of size m and n. Then the Wilcoxon rank sum
>> > statistic
>> > >>> is the number of all pairs (x[i], y[j]) for which y[j] is not
>> greater
>> > than
>> > >>> x[i]. This statistic takes values between 0 and m * n, and its mean
>> and
>> > >>> variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.
>> > >>>
>> > >>> As a nice feature of the non-parametric statistic, it is usually
>> > >>> distribution-free so you can pick any distribution you like to
>> compute
>> > the
>> > >>> same statistic. I wonder if this is the case, but I might be wrong.
>> > >>>
>> > >>> Cheers,
>> > >>> Jiefei
>> > >>>
>> > >>>
>> > >>> On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves <
>> > >>> spencer.gra...@effectivedefense.org> wrote:
>> > >>>
>> > >>>>
>> > >>>>
>> > >>>> On 2021-3-19 9:52 AM, Jiefei Wang wrote:
>> > >>>>> After digging into the R source, it turns out that the argument
>> > >>> `exact`
>> > >>>> has
>> > >>>>> nothing to do with the numeric precision. It only affects the
>> > >>> statistic
>> > >>>>> model used to compute the p-value. When `exact=TRUE` the true
>> > >>>> distribution
>> > >>>>> of the statistic will be used. Otherwise, a normal approximation
>> will
>> > >>> be
>> > >>>>> used.
>> > >>>>>
>> > >>>>> I think the documentation needs to be improved here, you can
>> compute
>> > >>> the
>> > >>>>> exact p-value *only* when you do not have any ties in your data.
>> If
>> > >>> you
>> > >>>>> have ties in your data you will get the p-value from the normal
>> > >>>>> approximation no matter what value you put in `exact`. This
>> behavior
>> > >>>> should
>> > >>>>> be documented or a warning should be given when `exact=TRUE` and
>> ties
>> > >>>>> present.
>> > >>>>>
>> > >>>>> FYI, if the exact p-value is required, `pwilcox` function will be
>> > >>> used to
>> > >>>>> compute the p-value. There are no details on how it computes the
>> > >>> pvalue
>> > >>>> but
>> > >>>>> its C code seems to compute the probability table, so I assume it
>> > >>>> computes
>> > >>>>> the exact p-value from the true distribution of the statistic,
>> not a
>> > >>>>> permutation or MC p-value.
>> > >>>>
>> > >>>>
>> > >>>>       My example shows that it does NOT use Monte Carlo, because
>> > >>>> otherwise it uses some distribution.  I believe the term "exact"
>> means
>> > >>>> that it uses the permutation distribution, though I could be
>> mistaken.
>> > >>>> If it's NOT a permutation distribution, I don't know what it is.
>> > >>>>
>> > >>>>
>> > >>>>       Spencer
>> > >>>>>
>> > >>>>> Best,
>> > >>>>> Jiefei
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>> On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwj...@gmail.com>
>> > >>> wrote:
>> > >>>>>
>> > >>>>>> Hey,
>> > >>>>>>
>> > >>>>>> I just want to point out that the word "exact" has two meanings.
>> It
>> > >>> can
>> > >>>>>> mean the numerically accurate p-value as Bogdan asked in his
>> first
>> > >>>> email,
>> > >>>>>> or it could mean the p-value calculated from the exact
>> distribution
>> > >>> of
>> > >>>> the
>> > >>>>>> statistic(In this case, U stat). These two are actually not
>> related,
>> > >>>> even
>> > >>>>>> though they all called "exact".
>> > >>>>>>
>> > >>>>>> Best,
>> > >>>>>> Jiefei
>> > >>>>>>
>> > >>>>>> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves <
>> > >>>>>> spencer.gra...@effectivedefense.org> wrote:
>> > >>>>>>
>> > >>>>>>>
>> > >>>>>>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
>> > >>>>>>>> thanks a lot, Vivek ! in other words, assuming that we work
>> with
>> > >>> 1000
>> > >>>>>>> data
>> > >>>>>>>> points,
>> > >>>>>>>>
>> > >>>>>>>> shall we use EXACT = TRUE, it uses the normal approximation,
>> > >>>>>>>>
>> > >>>>>>>> while if EXACT=FALSE (for these large samples), it does not ?
>> > >>>>>>>
>> > >>>>>>>        As David Winsemius noted, the documentation is not clear.
>> > >>>>>>> Consider the following:
>> > >>>>>>>
>> > >>>>>>>> set.seed(1)  > x <- rnorm(100) > y <- rnorm(100, 2) > >
>> > >>> wilcox.test(x,
>> > >>>>>>> y)$p.value
>> > >>>>>>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 >
>> >
>> > >>>>>>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 >
>> > >>> wilcox.test(x,
>> > >>>>>>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>> > >>>>>>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
>> > >>>>>>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
>> > >>>>>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>> > >>>>>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>> > >>>>>>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>> > >>>>>>> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
>> > >>>>>>> 1.172189e-25 and 4.123875e-32. The first one, I think, is the
>> > normal
>> > >>>>>>> approximation, which is the same as exact=FALSE. I think that
>> with
>> > >>>>>>> exact=FALSE, you get a permutation distribution, though I'm not
>> > >>> sure.
>> > >>>>>>> You might try looking at "wilcox_test in package coin for exact,
>> > >>>>>>> asymptotic and Monte Carlo conditional p-values, including in
>> the
>> > >>>>>>> presence of ties" to see if it is clearer. NOTE: R is case
>> > >>> sensitive,
>> > >>>> so
>> > >>>>>>> "EXACT" is a different variable from "exact". It is interpreted
>> as
>> > >>> an
>> > >>>>>>> optional argument, which is not recognized and therefore
>> ignored in
>> > >>>> this
>> > >>>>>>> context.
>> > >>>>>>>           Hope this helps.
>> > >>>>>>>           Spencer
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>>> On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mm...@gmail.com
>> >
>> > >>>> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> Hi Bogdan,
>> > >>>>>>>>>
>> > >>>>>>>>> You can also get the information from the link of the
>> Wilcox.test
>> > >>>>>>> function
>> > >>>>>>>>> page.
>> > >>>>>>>>>
>> > >>>>>>>>> “By default (if exact is not specified), an exact p-value is
>> > >>> computed
>> > >>>>>>> if
>> > >>>>>>>>> the samples contain less than 50 finite values and there are
>> no
>> > >>> ties.
>> > >>>>>>>>> Otherwise, a normal approximation is used.”
>> > >>>>>>>>>
>> > >>>>>>>>> For more:
>> > >>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>
>> > >>>>
>> > >>>
>> >
>> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html
>> > >>>>>>>>> Hope this helps!
>> > >>>>>>>>>
>> > >>>>>>>>> Best,
>> > >>>>>>>>>
>> > >>>>>>>>> VD
>> > >>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <
>> tan...@gmail.com
>> > >
>> > >>>>>>> wrote:
>> > >>>>>>>>>> Dear Peter, thanks a lot. yes, we can see a very precise
>> > p-value,
>> > >>>> and
>> > >>>>>>> that
>> > >>>>>>>>>> was the request from the journal.
>> > >>>>>>>>>>
>> > >>>>>>>>>> if I may ask another question please : what is the meaning of
>> > >>>>>>> "exact=TRUE"
>> > >>>>>>>>>> or "exact=FALSE" in wilcox.test ?
>> > >>>>>>>>>>
>> > >>>>>>>>>> i can see that the "numerically precise" p-values are
>> different.
>> > >>>>>>> thanks a
>> > >>>>>>>>>> lot !
>> > >>>>>>>>>>
>> > >>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>> > >>>>>>>>>> tst$p.value
>> > >>>>>>>>>> [1] 8.535524e-25
>> > >>>>>>>>>>
>> > >>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE)
>> > >>>>>>>>>> tst$p.value
>> > >>>>>>>>>> [1] 3.448211e-25
>> > >>>>>>>>>>
>> > >>>>>>>>>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder <
>> > >>>>>>>>>> peter.langfel...@gmail.com> wrote:
>> > >>>>>>>>>>
>> > >>>>>>>>>>> I thinnk the answer is much simpler. The print method for
>> > >>>> hypothesis
>> > >>>>>>>>>>> tests (class htest) truncates the p-values. In the above
>> > >>> example,
>> > >>>>>>>>>>> instead of using
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> and copying the output, just print the p-value:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>> > >>>>>>>>>>> tst$p.value
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> [1] 2.988368e-32
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> I think this value is what the journal asks for.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> HTH,
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Peter
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves
>> > >>>>>>>>>>> <spencer.gra...@effectivedefense.org> wrote:
>> > >>>>>>>>>>>>         I would push back on that from two perspectives:
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>               1.  I would study exactly what the journal
>> said
>> > >>>> very
>> > >>>>>>>>>>>> carefully.  If they mandated "wilcox.test", that function
>> has
>> > >>> an
>> > >>>>>>>>>>>> argument called "exact".  If that's what they are asking,
>> then
>> > >>>> using
>> > >>>>>>>>>>>> that argument gives the exact p-value, e.g.:
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>           Wilcoxon rank sum exact test
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> data:  rnorm(100) and rnorm(100, 2)
>> > >>>>>>>>>>>> W = 691, p-value < 2.2e-16
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>               2.  If that's NOT what they are asking, then
>> I'm
>> > >>>> not
>> > >>>>>>>>>>>> convinced what they are asking makes sense:  There is is no
>> > >>> such
>> > >>>>>>> thing
>> > >>>>>>>>>>>> as an "exact p value" except to the extent that certain
>> > >>>> assumptions
>> > >>>>>>>>>>>> hold, and all models are wrong (but some are useful), as
>> > George
>> > >>>> Box
>> > >>>>>>>>>>>> famously said years ago.[1]  Truth only exists in
>> mathematics,
>> > >>> and
>> > >>>>>>>>>>>> that's because it's a fiction to start with ;-)
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>         Hope this helps.
>> > >>>>>>>>>>>>         Spencer Graves
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> [1]
>> > >>>>>>>>>>>> https://en.wikipedia.org/wiki/All_models_are_wrong
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
>> > >>>>>>>>>>>>>    <
>> > >>>>>>>
>> > >>>>
>> > https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16
>> > >>>>>>>>>>>>> Dear all,
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> i would appreciate having your advice on the following
>> please
>> > >>> :
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> in R, the wilcox.test() provides "a p-value < 2.2e-16",
>> when
>> > >>> we
>> > >>>>>>>>>> compare
>> > >>>>>>>>>>>>> sets of 1000 genes expression (in the genomics field).
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> however, the journal asks us to provide the exact p value
>> ...
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> would it be legitimate to write : "p-value = 0" ? thanks a
>> > >>> lot,
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> -- bogdan
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>        [[alternative HTML version deleted]]
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> ______________________________________________
>> > >>>>>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and
>> > more,
>> > >>>> see
>> > >>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >>>>>>>>>>>>> PLEASE do read the posting guide
>> > >>>>>>>>>>> http://www.R-project.org/posting-guide.html
>> > >>>>>>>>>>>>> and provide commented, minimal, self-contained,
>> reproducible
>> > >>>> code.
>> > >>>>>>>>>>>> ______________________________________________
>> > >>>>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and
>> more,
>> > >>> see
>> > >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >>>>>>>>>>>> PLEASE do read the posting guide
>> > >>>>>>>>>>> http://www.R-project.org/posting-guide.html
>> > >>>>>>>>>>>> and provide commented, minimal, self-contained,
>> reproducible
>> > >>> code.
>> > >>>>>>>>>>          [[alternative HTML version deleted]]
>> > >>>>>>>>>>
>> > >>>>>>>>>> ______________________________________________
>> > >>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and
>> more,
>> > >>> see
>> > >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >>>>>>>>>> PLEASE do read the posting guide
>> > >>>>>>>>>> http://www.R-project.org/posting-guide.html
>> > >>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>> > >>> code.
>> > >>>>>>>>>>
>> > >>>>>>>>> --
>> > >>>>>>>>> ----------------------------------------------------------
>> > >>>>>>>>>
>> > >>>>>>>>> Vivek Das, PhD
>> > >>>>>>>>>
>> > >>>>>>>>       [[alternative HTML version deleted]]
>> > >>>>>>>>
>> > >>>>>>>> ______________________________________________
>> > >>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
>> see
>> > >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >>>>>>>> PLEASE do read the posting guide
>> > >>>>>>> http://www.R-project.org/posting-guide.html
>> > >>>>>>>> and provide commented, minimal, self-contained, reproducible
>> code.
>> > >>>>>>>
>> > >>>>>>>         [[alternative HTML version deleted]]
>> > >>>>>>>
>> > >>>>>>> ______________________________________________
>> > >>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
>> see
>> > >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >>>>>>> PLEASE do read the posting guide
>> > >>>>>>> http://www.R-project.org/posting-guide.html
>> > >>>>>>> and provide commented, minimal, self-contained, reproducible
>> code.
>> > >>>>>>>
>> > >>>>
>> > >>>>
>> > >>>
>> > >>>        [[alternative HTML version deleted]]
>> > >>>
>> > >>> ______________________________________________
>> > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >>> PLEASE do read the posting guide
>> > >>> http://www.R-project.org/posting-guide.html
>> > >>> and provide commented, minimal, self-contained, reproducible code.
>> > >>>
>> > >>
>> > >
>> > >        [[alternative HTML version deleted]]
>> > >
>> > > ______________________________________________
>> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about a p-value < 2.2e-16

Reply via email to