Yes, that is the point that David made and that I illustrated with the simulations: The null distribution of W is more narrow in the presence of ties, hence W=485 is a more extreme observation in the tied case. I.e. it will look less extreme if you ignore that there are ties.
-pd On 04 Sep 2014, at 15:17 , Lorenz, David <lor...@usgs.gov> wrote: > I think that the issue, at least with the online calculator that I looked > at, is that it does not adjust the standard deviation of the test > statistic for ties, so the standard deviation is larger and hence larger > p-value. I was able to reproduce the reported z-score using the equation > for the standard deviation with out ties. > Dave > > Message: 14 >> Date: Wed, 3 Sep 2014 23:20:04 +0200 >> From: peter dalgaard <pda...@gmail.com >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=pda...@gmail.com>> >> To: David L Carlson <dcarl...@tamu.edu >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=dcarl...@tamu.edu>> >> Cc: "r-help@r-project.org >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=r-help@r-project.org>" >> <r-help@r-project.org >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=r-help@r-project.org>>, >> W Bradley Knox >> <bradk...@mit.edu >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=bradk...@mit.edu>> >> Subject: Re: [R] wilcox.test - difference between p-values of R and >> online calculators >> Message-ID: <ffde9637-160e-4555-9c2a-e94494700...@gmail.com >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=ffde9637-160e-4555-9c2a-e94494700...@gmail.com> >>> >> Content-Type: text/plain; charset=us-ascii >> >> Notice that correct=TRUE for wilcox.test refers to the continuity >> correction, not the correction for ties. >> >> You can fairly easily simulate from the exact distribution of W: >> >> x <- c(359,359,359,359,359,359,335,359,359,359,359, >> 359,359,359,359,359,359,359,359,359,359,303,359,359,359) >> y <- c(332,85,359,359,359,220,231,300,359,237,359,183,286, >> 355,250,105,359,359,298,359,359,359,28.6,359,359,128) >> R <- rank(c(x,y)) >> sim <- replicate(1e6,sum(sample(R,25))) - 325 >> >> # With no ties, the ranks would be a permutation of 1:51, and we could do >> sim2 <- replicate(1e6,sum(sample(1:51,25))) - 325 >> >> In either case, the p-value is the probability that W >= 485 or W <= 165, >> and >> >>> mean(sim >= 485 | sim <= 165) >> [1] 0.000151 >>> mean(sim2 >= 485 | sim2 <= 165) >> [1] 0.002182 >> >> Also, try >> >> plot(density(sim)) >> lines(density(sim2)) >> >> and notice that the distribution of sim is narrower than that of sim2 >> (hence the smaller p-value with tie correction), but also that the normal >> approximationtion is not nearly as good as for the untied case. The >> "clumpiness" is due to the fact that 35 of the ranks have the maximum value >> of 34 (corresponding to the original 359's). >> >> -pd >> >> On 03 Sep 2014, at 19:13 , David L Carlson <dcarl...@tamu.edu >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=dcarl...@tamu.edu>> >> wrote: >> >>> Since they all have the same W/U value, it seems likely that the >> difference is how the different versions adjust the standard error for >> ties. Here are a couple of posts addressing the issues of ties: >>> >>> http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9200.html >>> >> http://stats.stackexchange.com/questions/6127/which-permutation-test-implementation-in-r-to-use-instead-of-t-tests-paired-and >>> >>> David C >>> >>> From: wbradleyk...@gmail.com >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=wbradleyk...@gmail.com> >> [mailto:wbradleyk...@gmail.com >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=wbradleyk...@gmail.com>] >> On Behalf Of W Bradley Knox >>> Sent: Wednesday, September 3, 2014 9:20 AM >>> To: David L Carlson >>> Cc: Tal Galili; r-help@r-project.org >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=r-help@r-project.org> >>> Subject: Re: [R] wilcox.test - difference between p-values of R and >> online calculators >>> >>> Tal and David, thanks for your messages. >>> >>> I should have added that I tried all variations of true/false values for >> the exact and correct parameters. Running with correct=FALSE makes only a >> tiny change, resulting in W = 485, p-value = 0.0002481. >>> >>> At one point, I also thought that the discrepancy between R and these >> online calculators might come from how ties are handled, but the fact that >> R and two of the online calcultors reach the same U/W values seems to >> indicate that ties aren't the issue, since (I believe) the U or W values >> contain all of the information needed to calculate the p-value, assuming >> the number of samples is also known for each condition. (However, it's been >> a while since I looked into how MWU tests work, so maybe now's the time to >> refresh.) If that's correct, the discrepancy seems to be based in what R >> does with the W value that is identical to the U values of two of the >> online calculators. (I'm also assuming that U and W have the same meaning, >> which seems likely.) >>> >>> - Brad >>> >>> ____________________ >>> W. Bradley Knox, PhD >>> http://bradknox.net<http://bradknox.net/> >>> bradk...@mit.edu >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=bradk...@mit.edu> >> <mailto:bradk...@mit.edu >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=bradk...@mit.edu>> >>> >>> On Wed, Sep 3, 2014 at 9:10 AM, David L Carlson <dcarl...@tamu.edu >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=dcarl...@tamu.edu> >> <mailto:dcarl...@tamu.edu >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=dcarl...@tamu.edu>>> >> wrote: >>> That does not change the results. The problem is likely to be the way >> ties are handled. The first sample has 25 values of which 23 are identical >> (359). The second sample has 26 values of which 12 are identical (359). The >> difference between the implementations may be a result of the way the ties >> are ranked. For example the R function rank() offers 5 different ways of >> handling the rank on tied observations. With so many ties, that could make >> a substantial difference. >>> >>> Package coin has wilxon_test() which uses Monte Carlo simulation to >> estimate the confidence limits. >>> >>> ------------------------------------- >>> David L Carlson >>> Department of Anthropology >>> Texas A&M University >>> College Station, TX 77840-4352 >>> >>> >>> -----Original Message----- >>> From: r-help-boun...@r-project.org >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=r-help-boun...@r-project.org> >> <mailto:r-help-boun...@r-project.org >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=r-help-boun...@r-project.org>> >> [mailto:r-help-boun...@r-project.org >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=r-help-boun...@r-project.org> >> <mailto:r-help-boun...@r-project.org >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=r-help-boun...@r-project.org>>] >> On Behalf Of Tal Galili >>> Sent: Wednesday, September 3, 2014 5:24 AM >>> To: W Bradley Knox >>> Cc: r-help@r-project.org >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=r-help@r-project.org> >> <mailto:r-help@r-project.org >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=r-help@r-project.org>> >>> Subject: Re: [R] wilcox.test - difference between p-values of R and >> online calculators >>> >>> It seems your numbers has ties. What happens if you run wilcox.test with >>> correct=FALSE, will the results be the same as the online calculators? >>> >>> >>> >>> ----------------Contact >>> Details:------------------------------------------------------- >>> Contact me: tal.gal...@gmail.com >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=tal.gal...@gmail.com> >> <mailto:tal.gal...@gmail.com >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=tal.gal...@gmail.com>> >> | >>> Read me: www.talgalili.com<http://www.talgalili.com> (Hebrew) | >> www.biostatistics.co.il<http://www.biostatistics.co.il> (Hebrew) | >>> www.r-statistics.com<http://www.r-statistics.com> (English) >>> >> ---------------------------------------------------------------------------------------------- >>> >>> >>> >>> On Wed, Sep 3, 2014 at 3:54 AM, W Bradley Knox <bradk...@mit.edu >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=bradk...@mit.edu> >> <mailto:bradk...@mit.edu >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=bradk...@mit.edu>>> >> wrote: >>> >>>> Hi. >>>> >>>> I'm taking the long-overdue step of moving from using online >> calculators to >>>> compute results for Mann-Whitney U tests to a more streamlined system >>>> involving R. >>>> >>>> However, I'm finding that R computes a different result than the 3 >> online >>>> calculators that I've used before (all of which approximately agree). >> These >>>> calculators are here: >>>> >>>> http://elegans.som.vcu.edu/~leon/stats/utest.cgi >>>> http://vassarstats.net/utest.html >>>> http://www.socscistatistics.com/tests/mannwhitney/ >>>> >>>> An example calculation is >>>> >>>> >>>> >> *wilcox.test(c(359,359,359,359,359,359,335,359,359,359,359,359,359,359,359,359,359,359,359,359,359,303,359,359,359),c(332,85,359,359,359,220,231,300,359,237,359,183,286,355,250,105,359,359,298,359,359,359,28.6,359,359,128))* >>>> >>>> which prints >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *Wilcoxon rank sum test with continuity correction data: c(359, 359, >> 359, >>>> 359, 359, 359, 335, 359, 359, 359, 359, 359, and c(332, 85, 359, 359, >> 359, >>>> 220, 231, 300, 359, 237, 359, 183, 359, 359, 359, 359, 359, 359, 359, >> 359, >>>> 359, 303, 359, 359, and 286, 355, 250, 105, 359, 359, 298, 359, 359, >> 359, >>>> 28.6, 359, 359) and 359, 128) W = 485, p-value = 0.0002594 alternative >>>> hypothesis: true location shift is not equal to 0 Warning message: In >>>> wilcox.test.default(c(359, 359, 359, 359, 359, 359, 335, 359, : cannot >>>> compute exact p-value with ties* >>>> >>>> >>>> However, all of the online calculators find p-values close to 0.0025, >> 10x >>>> the value output by R. All results are for a two-tailed case. >> Importantly, >>>> the W value computed by R *does agree* with the U values output by the >>>> first two online calculators listed above, yet it has a different >> p-value. >>>> >>>> Can anyone shed some light on how and why R's calculation differs from >> that >>>> of these online calculators? Thanks for your time. >>>> >>>> ____________________ >>>> W. Bradley Knox, PhD >>>> http://bradknox.net >>>> bradk...@mit.edu >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=bradk...@mit.edu> >> <mailto:bradk...@mit.edu >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=bradk...@mit.edu>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=R-help@r-project.org> >> <mailto:R-help@r-project.org >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=R-help@r-project.org>> >> mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=R-help@r-project.org> >> <mailto:R-help@r-project.org >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=R-help@r-project.org>> >> mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=R-help@r-project.org> >> mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Peter Dalgaard, Professor, >> Center for Statistics, Copenhagen Business School >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> Phone: (+45)38153501 >> Email: pd....@cbs.dk >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=pd....@cbs.dk> Priv: >> pda...@gmail.com >> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=pda...@gmail.com> >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.