Dear Bharat Rawlley, On 2021-01-20 1:45 p.m., bharat rawlley via R-help wrote:
Dear Professor John, Thank you very much for your reply! I agree with you that the non-parametric tests I mentioned in my previous email (Moods median test and Median test) do not make sense in this situation as they treat PFD_n and drug_code as different groups. As you correctly said, I want to use PFD_n as a vector of scores and drug_code to make two groups out of it. This is exactly what the Independent samples median test does in SPSS. I wish to perform the same test in R and am unable to do so. Simply put, I am asking how to perform the Independent samples median test in R just like it is performed in SPSS?
I'm afraid that I'm the wrong person to ask, since I haven't used SPSS in perhaps 30 years and have no idea what it does to test for differences in medians. A Google search for "independent samples median test in R" turns up a number of hits.
Secondly, for the question you are asking about the test statistic, I have not performed the Wilcoxon Rank sum test in SPSS for the PFD_n and drug_code data. I have said something to the contrary in my first email, I apologize for that.
For continuous data, the Wilcoxon test is, I believe, a reasonable choice, but not when there are so many ties. If SPSS doesn't perform a Wilcoxon test for a difference in medians, then there's of course no reason to expect that the p-values would be the same.
Best, John
Thank you very much for your time! Yours sincerelyBharat Rawlley On Wednesday, 20 January, 2021, 04:47:21 am IST, John Fox <j...@mcmaster.ca> wrote:Dear Bharat Rawlley,What you tried to do appears to be nonsense. That is, you're treating PFD_n and drug_code as if they were scores for two different groups. I assume that what you really want to do is to treat PFD_n as a vector of scores and drug_code as defining two groups. If that's correct, and with your data into Data, you can try the following: ------snip ------ > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE) Wilcoxon rank sum test with continuity correction data: PFD_n by drug_code W = 197, p-value = 0.05563 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: -2.000014e+00 5.037654e-05 sample estimates: difference in location -1.000019 Warning messages: 1: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26, : cannot compute exact p-value with ties 2: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26, : cannot compute exact confidence intervals with ties ------snip ------ You can get an approximate confidence interval by specifying exact=FALSE: ------snip ------ > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE, exact=FALSE) Wilcoxon rank sum test with continuity correction data: PFD_n by drug_code W = 197, p-value = 0.05563 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: -2.000014e+00 5.037654e-05 sample estimates: difference in location -1.000019 ------snip ------ As it turns out, your data are highly discrete and have a lot of ties (see in particular PFD_n = 28): ------snip ------ > xtabs(~ PFD_n + drug_code, data=Data) drug_code PFD_n 0 1 0 2 0 16 1 1 18 0 1 19 0 1 20 2 0 22 0 1 24 2 0 25 1 2 26 5 2 27 4 2 28 5 13 30 1 2 ------snip ------ I'm no expert in nonparametric inference, but I doubt whether the approximate p-value will be very accurate for data like these. I don't know why wilcox.test() (correctly used) and SPSS are giving you slightly different results -- assuming that you're actually doing the same thing in both cases. I couldn't help but notice that most of your data are missing. Are you getting the same value of the test statistic and different p-values, or is the test statistic different as well? I hope this helps, John John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://socialsciences.mcmaster.ca/jfox/ On 2021-01-19 5:46 a.m., bharat rawlley via R-help wrote:[[alternative HTML version deleted]]Thank you for the reply and suggestion, Michael! I used dput() and this is the output I can share with you. Simply explained, I have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column has 132 values (including NA). The problem with the Wilcoxon Rank Sum test has been described in my first email. Please do let me know if you need any further clarification from my side! Thanks a lot for your time! structure(list(drug_code = c(0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0), freq4w_n = c(1, NA, NA, 0, NA, 4, NA, 10, NA, 0, 6, NA, NA, NA, NA, NA, 10, NA, 0, NA, NA, NA, NA, 0, NA, 0, NA, NA, NA, 0, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, 12, 0, NA, 1, 2, 1, 2, 2, NA, 28, 0, NA, 4, NA, 1, NA, NA, NA, NA, NA, 0, 3, 1, NA, NA, NA, NA, 4, 28, NA, NA, 0, 2, 12, 0, NA, NA, NA, 0, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA, NA, 6, 1, NA, NA, NA, 0, NA, NA, NA, 0, 0, NA, 0, NA, 2, 8, 3, NA, NA, NA, 0, NA, NA, NA, 9, NA, NA, NA, NA, NA, NA, NA, NA), PFD_n = c(27, NA, NA, 28, NA, 26, NA, 20, NA, 30, 24, NA, NA, NA, NA, NA, 18, NA, 28, NA, NA, NA, NA, 28, NA, 28, NA, NA, NA, 28, NA, 28, NA, NA, NA, NA, NA, NA, NA, NA, 28, 28, 16, 28, NA, 27, 26, 27, 26, 26, NA, 0, 30, NA, 24, NA, 27, NA, NA, NA, NA, NA, 28, 25, 27, NA, NA, NA, NA, 26, 0, NA, NA, 28, 26, 16, 28, NA, NA, NA, 28, NA, 28, NA, NA, NA, NA, NA, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA, NA, 22, 27, NA, NA, NA, 28, NA, NA, NA, 28, 28, NA, 28, NA, 26, 20, 25, NA, NA, NA, 30, NA, NA, NA, 19, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -132L), class = c("tbl_df", "tbl", "data.frame")) Yours sincerely Bharat Rawlley On Tuesday, 19 January, 2021, 03:53:27 pm IST, Michael Dewey <li...@dewey.myzen.co.uk> wrote:Unfortunately your data did not come through. Try using dput() and thenpasting that into the body of your e-mail message. On 18/01/2021 17:26, bharat rawlley via R-help wrote:Hello, On running the Wilcoxon Rank Sum test in R and SPSS, I am getting the following discrepancies which I am unable to explain. Q1 In the attached data set, I was trying to compare freq4w_n in those with drug_code 0 vs 1. SPSS gives a P value 0.031 vs R gives a P value 0.001779. The code I used in R is as follows - wilcox.test(freq4w_n, drug_code, conf.int = T) Q2 Similarly, in the same data set, when trying to compare PFD_n in those with drug_code 0 vs 1, SPSS gives a P value 0.038 vs R gives a P value < 2.2e-16. The code I used in R is as follows - wilcox.test(PFD_n, drug_code, mu = 0, alternative = "two.sided", correct = TRUE, paired = FALSE, conf.int = TRUE) I have tried searching on Google and watching some Youtube tutorials, I cannot find an answer, Any help will be really appreciated, Thank you! ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://socialsciences.mcmaster.ca/jfox/ ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.