Re: [R] Different results on running Wilcoxon Rank Sum test in R and SPSS

John Fox Wed, 20 Jan 2021 15:11:25 -0800

Dear Bharat Rawlley,

On 2021-01-20 1:45 p.m., bharat rawlley via R-help wrote:

  Dear Professor John,
Thank you very much for your reply!
I agree with you that the non-parametric tests I mentioned in my previous email 
(Moods median test and Median test) do not make sense in this situation as they 
treat PFD_n and drug_code as different groups. As you correctly said, I want to 
use PFD_n as a vector of scores and drug_code to make two groups out of it. 
This is exactly what the Independent samples median test does in SPSS. I wish 
to perform the same test in R and am unable to do so.
Simply put, I am asking how to perform the Independent samples median test in R 
just like it is performed in SPSS?

I'm afraid that I'm the wrong person to ask, since I haven't used SPSSin perhaps 30 years and have no idea what it does to test fordifferences in medians. A Google search for "independent samples mediantest in R" turns up a number of hits.


Secondly, for the question you are asking about the test statistic, I have not 
performed the Wilcoxon Rank sum test in SPSS for the PFD_n and drug_code data. 
I have said something to the contrary in my first email, I apologize for that.

For continuous data, the Wilcoxon test is, I believe, a reasonablechoice, but not when there are so many ties. If SPSS doesn't perform aWilcoxon test for a difference in medians, then there's of course noreason to expect that the p-values would be the same.


Best,
 John

Thank you very much for your time!
Yours sincerelyBharat Rawlley    On Wednesday, 20 January, 2021, 04:47:21 am IST, 
John Fox <j...@mcmaster.ca> wrote:

Dear Bharat Rawlley,

What you tried to do appears to be nonsense. That is, you're treating
PFD_n and drug_code as if they were scores for two different groups.

I assume that what you really want to do is to treat PFD_n as a vector
of scores and drug_code as defining two groups. If that's correct, and
with your data into Data, you can try the following:

------snip ------

  > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE)

     Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
   -2.000014e+00  5.037654e-05
sample estimates:
difference in location
               -1.000019

Warning messages:
1: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
   cannot compute exact p-value with ties
2: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
   cannot compute exact confidence intervals with ties

------snip ------

You can get an approximate confidence interval by specifying exact=FALSE:

------snip ------

  > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE, exact=FALSE)

     Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
   -2.000014e+00  5.037654e-05
sample estimates:
difference in location
               -1.000019

------snip ------

As it turns out, your data are highly discrete and have a lot of ties
(see in particular PFD_n = 28):

------snip ------

  > xtabs(~ PFD_n + drug_code, data=Data)

       drug_code
PFD_n  0  1
     0  2  0
     16  1  1
     18  0  1
     19  0  1
     20  2  0
     22  0  1
     24  2  0
     25  1  2
     26  5  2
     27  4  2
     28  5 13
     30  1  2

------snip ------

I'm no expert in nonparametric inference, but I doubt whether the
approximate p-value will be very accurate for data like these.

I don't know why wilcox.test() (correctly used) and SPSS are giving you
slightly different results -- assuming that you're actually doing the
same thing in both cases. I couldn't help but notice that most of your
data are missing. Are you getting the same value of the test statistic
and different p-values, or is the test statistic different as well?

I hope this helps,
   John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-01-19 5:46 a.m., bharat rawlley via R-help wrote:

   Thank you for the reply and suggestion, Michael!
I used dput() and this is the output I can share with you. Simply explained, I 
have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column has 132 
values (including NA). The problem with the Wilcoxon Rank Sum test has been 
described in my first email.
Please do let me know if you need any further clarification from my side! 
Thanks a lot for your time!
structure(list(drug_code = c(0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 
0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 
1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 
1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0), freq4w_n 
= c(1, NA, NA, 0, NA, 4, NA, 10, NA, 0, 6, NA, NA, NA, NA, NA, 10, NA, 0, NA, NA, NA, NA, 0, NA, 0, NA, NA, 
NA, 0, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, 12, 0, NA, 1, 2, 1, 2, 2, NA, 28, 0, NA, 4, NA, 1, NA, 
NA, NA, NA, NA, 0, 3, 1, NA, NA, NA, NA, 4, 28, NA, NA, 0, 2, 12, 0, NA, NA, NA, 0, NA, 0, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA, NA, 6, 1, NA, NA, NA, 0, NA, NA, NA, 0, 0, NA, 0, NA, 2, 8, 3, NA, 
NA, NA, 0, NA, NA, NA, 9, NA, NA, NA, NA, NA, NA, NA, NA), PFD_n = c(27, NA, NA, 28, NA, 26, NA, 20, NA, 30, 
24, NA, NA, NA, NA, NA, 18, NA, 28, NA, NA, NA, NA, 28, NA, 28, NA, NA, NA, 28, NA, 28, NA, NA, NA, NA, NA, 
NA, NA, NA, 28, 28, 16, 28, NA, 27, 26, 27, 26, 26, NA, 0, 30, NA, 24, NA, 27, NA, NA, NA, NA, NA, 28, 25, 
27, NA, NA, NA, NA, 26, 0, NA, NA, 28, 26, 16, 28, NA, NA, NA, 28, NA, 28, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 25, NA, NA, NA, NA, NA, NA, 22, 27, NA, NA, NA, 28, NA, NA, NA, 28, 28, NA, 28, NA, 26, 20, 25, NA, NA, 
NA, 30, NA, NA, NA, 19, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -132L), class = 
c("tbl_df", "tbl", "data.frame"))

Yours sincerely Bharat Rawlley    On Tuesday, 19 January, 2021, 03:53:27 pm IST, 
Michael Dewey <li...@dewey.myzen.co.uk> wrote:

Unfortunately your data did not come through. Try using dput() and then

pasting that into the body of your e-mail message.

On 18/01/2021 17:26, bharat rawlley via R-help wrote:

Hello,
On running the Wilcoxon Rank Sum test in R and SPSS, I am getting the following 
discrepancies which I am unable to explain.
Q1 In the attached data set, I was trying to compare freq4w_n in those with 
drug_code 0 vs 1. SPSS gives a P value 0.031 vs R gives a P value 0.001779.
The code I used in R is as follows -
wilcox.test(freq4w_n, drug_code, conf.int = T)


Q2 Similarly, in the same data set, when trying to compare PFD_n in those with 
drug_code 0 vs 1, SPSS gives a P value 0.038 vs R gives a P value < 2.2e-16.
The code I used in R is as follows -
wilcox.test(PFD_n, drug_code, mu = 0, alternative = "two.sided", correct = 
TRUE, paired = FALSE, conf.int = TRUE)


I have tried searching on Google and watching some Youtube tutorials, I cannot 
find an answer, Any help will be really appreciated, Thank you!
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Different results on running Wilcoxon Rank Sum test in R and SPSS

Reply via email to