On Jul 20, 2009, at 5:22 AM, Tal Galili wrote:

Hello David Winsemius and the rest of the R help group,

David, I tried to answer your question to the best of my abilities, If I was unclear or still am leaving some things out, please help me in focusing my situation even further. here are my answers to the questions you posed:
1) Please define "better" -
"better" is the one that is able to handle the questions at hand (marginal homogenity and symmetry) while giving meaningful results although the data is sometimes sparse (with Zeros in it) and the sample size is somewhat small (around 25 kids)

Frankly, the phrases "marginal homogeneity" and "symmetry" are, for me anyway, not particularly evocative of an interpretable sort of difference. I try to express my findings in terms I think my audience may have some chance of understanding: odds ratios or risk ratios or difference in mean effects ...


2) And now ... define "right"
What I meant with "right" is "what test did each of these procedures just perform" and also "what can I learn from each of the P's if they where to pass the significance bar (of let's say .05)"

3) "Perhaps from the perspective of a statistically naive reviewer."
Thank you for pointing to this being superficial, I would love for any help you could give in deepening my understanding.

It appeared from context (which was snipped) that you thought one was "better" because its p-value was lower. If the criterion by which you choose one statistical test over another is whether or not it happens to produce a signal p <0.05, then I think you are dredging rather than analyzing. I think the question should be instead whether the test is the most powerful for the particular hypothesis and data situation.


4) "The problem I am trying to solve" is for the following situation:

The data set:
I am analyzing a data set with subjects (kids) listening to the same music two times (randomized, and on different times and so on), the condition of the experiment is a bit different the first time the kids listens (X=1) then the second time (X=2). And the response (Y) the kid is making for the experiment is recorded as an ordered number of three levels: -1, 0, 1

So you would certainly want a test that properly handles ordinal effects. I am not sure that was clear at all from your earlier questions. Tests of hypotheses regarding ordered alternatives are often more powerful than ones that evaluate less specific alternatives.


The (statistical) question: did the difference in the experiment conditions yielded different rankings from the kids? and if so, was there a specific direction? e.g: did kids who by now (in part one of the experiment) answered mostly -1 and 0, now (in part two of the experiment) started answering more 0 and 1? Or, did kids who by now mostly answered 0 now started answering -1 and 1 ? and so on.

Analyses approach:
There are two basic ways to do this.
1) The first one is a Willcox test, to see if there was change in answers (Y) between the two situations (X=1, X=2)

I am here puzzled. Is the Willcox test a well known one in your academic domain? If it is I apologize for my lack of breadth in named tests. Or could you be referring to what is invoked in R with wilcox.test()? I am guessing from context that you might be asking about the Wilcoxon signed-rank test for paired data situations. It would in fact address the ordering of your paired outcomes, but all of the Wilcoxon tests are based on the measures being from a continuous distribution and statistical validity for your situation would be questionable.

I would think that a proportional odds model for ordinal repeated responses would fit the data situation and the hypothesis of interest. You may want to search out Laura Thompson's R/S companion to Agresti's text. She has some worked examples.

2) The second one is to produce a 3 by 3 table, with the rows indicating what the kids answered to setting 1 of the experiment, and the columns indicating the kids answers to setting 2.
Now the question is:
was there marginal homogenity? if not, then that is an indicator that the general response to the experimental settings was different for the kids.

Can you put into natural language what you will explain to your audience once you determine the presence or absence of "marginal homogeneity"?

Challenges:
1) what about symmetry ?
As Peter pointed out - you can easily check that the following two matrices have the same homogeneous margins, but only one is symmetric:
3 2 1
2 3 2
1 2 3

3 1 2
3 3 1
0 3 3

And running the two tests we have yields very interesting results (and if someone has an explanation for them, they would be greatly appreciated):

> tt <- as.table(t(matrix(c(30,10,20,
+                           30,30 ,10,
+                           0 ,30 ,30)
+                           , ncol .... [TRUNCATED]

The truncation is most unfortunate since it results our not seeing what made these two calls different.

> print(tt)
   A  B  C
A 30 10 20
B 30 30 10
C  0 30 30

> mcnemar.test(tt)

        McNemar's Chi-squared test

data:  tt
McNemar's chi-squared = 40, df = 3, p-value = 1.066e-08


> mh_test(tt)

        Asymptotic Marginal-Homogeneity Test

data:  response by groups (Var1, Var2)
         stratified by block
chi-squared = 0, df = 2, p-value = 1


> tt <- as.table(t(matrix(c(30,10,20,
+                           30,30 ,10,
+                           1 ,30 ,30)
+                           , ncol .... [TRUNCATED]

> print(tt)
   A  B  C
A 30 10 20
B 30 30 10
C  1 30 30

> mcnemar.test(tt)

        McNemar's Chi-squared test

data:  tt
McNemar's chi-squared = 37.1905, df = 3, p-value = 4.194e-08

The truncation snipped off the likely sources of the differences.


> mh_test(tt)

        Asymptotic Marginal-Homogeneity Test

data:  response by groups (Var1, Var2)
         stratified by block
chi-squared = 0.0244, df = 2, p-value = 0.9879


2) what about sparsity ?
What is the correct way to handle a sparse tables that includes some Zeros in them? (is filling them with 1's, in cases where the mcnemar is resulting with NA's a legitimate strategy ?)



David, thank you for the queries and the good intentions,
I would be very happy for any help, directions, clerifications from you and from the other members of this wonderful discussion group.


With much gratitude,
Tal



I hopes this helped clarify


On Mon, Jul 20, 2009 at 3:20 AM, David Winsemius <dwinsem...@comcast.net > wrote:

On Jul 19, 2009, at 6:09 PM, Tal Galili wrote:

Hello Charles,
Thank you for the detail reply.

I am still left with the leading question which is: which test should I use when analyzing the 3 by 3 matrix I have? The mcnemar.test or the mh_test?
Is the one necessarily better then the other?

Please define "better".


(for example for
sparser matrices ?)

That does not help.



What about:
mh_test(as.table(matrix(1:16,4)))
It returns a very significant result:
chi-squared = 11.4098, df = 3, p-value = 0.009704

Where as "mcnemar.test(matrix(1:16,4))", didn't:
McNemar's chi-squared = 11.5495, df = 6, p-value = 0.0728

So which one is "right" ?

And now ... define "right".


(from the looks of it, the mh_test is doing much better)

Perhaps from the perspective of a statistically naive reviewer.



Should the strategy be to try and use both methods, and start digging when
one doesn't sit well with the other?

I am reminded of Jim Holtam's tag line: "What problem are you trying to solve?"




Thanks,
Tal



On Sun, Jul 19, 2009 at 10:26 PM, Charles C. Berry <cbe...@tajo.ucsd.edu >wrote:

On Sun, 19 Jul 2009, Tal Galili wrote:

Hello David,Thank you for your answer.

Do you know then what does the "mcnemar.test" do in the case of a 3*3
table
?


     print(mcnemar.test)

will show you what it does.

Because the results for the simple example I gave are rather different (P
value of 0.053 VS 0.73)


The test mcnemar.test() constructs is one of symmetry, which is equivalent to marginal homogenity in hierarchical log-linear models as I recall from
Bishop, Fienberg, and Holland's 1975 opus on count data.

Stuart-Maxwell uses the dispersion matrix of marginal difference.

These are two different tests. I suspect that Stuart-Maxwell is less
susceptible to continuity issues in very sparse tables, which may account
for the difference you see here.



In case the mcnemar can't really handle a 3*3 matrix (or more), shouldn't there be an error massage for this case? (if so, who should I turn to, in
order to report this?)


Well, the code is pretty straightforward and

     mcnemar.test(matrix(1:16,4))

returns 11.5495 which is correct.

It looks like there is nothing to report. 3,1,5), ncol = 3))))


Chuck


Thanks again,
Tal





On Sun, Jul 19, 2009 at 3:47 PM, David Freedman <3.14da...@gmail.com>
wrote:


There is a function mh_test in the coin package.

library(coin)
mh_test(tt)

The documentation states, "The null hypothesis of independence of row and column totals is tested. The corresponding test for binary factors x and
y
is known as McNemar test. For larger tables, Stuart’s W0 statistic
(Stuart,
1955, Agresti, 2002, page 422, also known as Stuart-Maxwell test) is
computed."

hth, david freedman


Tal Galili wrote:


Hello all,

I wish to perform a mcnemar test for a 3 by 3 matrix.
By running the slandered R command I am getting a result but I am not

sure

I
am getting the correct one.
Here is an example code:

(tt <-  as.table(t(matrix(c(1,4,1    ,
                         0,5,5,
                         3,1,5), ncol = 3))))
mcnemar.test(tt, correct=T)
#And I get:
     McNemar's Chi-squared test
data:  tt
McNemar's chi-squared = 7.6667, df = 3, p-value = *0.05343*


Now I was wondering if the test I just performed is the correct one.

From looking at the Wikipedia article on mcnemar (

http://en.wikipedia.org/wiki/McNemar's_test), it is said that:
"The Stuart-Maxwell
test<http://ourworld.compuserve.com/homepages/jsuebersax/mcnemar.htm>
is
different generalization of the McNemar test, used for testing marginal
homogeneity in a square table with more than two rows/columns"

From searching for a Stuart-Maxwell

test<http://ourworld.compuserve.com/homepages/jsuebersax/mcnemar.htm>
in
google, I found an algorithm here:


http://www.m-hikari.com/ams/ams-password-2009/ams-password9-12-2009/abbasiAMS9-12-2009.pdf


From running this algorithm I am getting a different P value, here is
the

(somewhat ugly) code I produced for this:
get.d <- function(xx)
{
length1 <- dim(xx)[1]
ret1 <- margin.table(xx,1) - margin.table(xx,2)
return(ret1)
}

get.s <- function(xx)
{
the.s <- xx
for( i in 1:dim(xx)[1])
{
 for(j in 1:dim(xx)[2])
 {
   if(i == j)
   {
     the.s[i,j] <- margin.table(xx,1)[i] + margin.table(xx,2)[i] -
2*xx[i,i]
   } else {
     the.s[i,j] <- -(xx[i,j] + xx[j,i])
   }
 }
}
return(the.s)
}

chi.statistic <- t(get.d(tt)[-3]) %*% solve(get.s(tt)[-3,-3])  %*%
get.d(tt)[-3]
paste("the P value:", pchisq(chi.statistic, 2))

#and the result was:
"the P value: 0.268384371053358"



So to summarize my questions:
1) can I use "mcnemar.test" for 3*3 (or more) tables ?
2) if so, what test is being performed (
Stuart-Maxwell<

http://ourworld.compuserve.com/homepages/jsuebersax/mcnemar.htm>)

?
3) Do you have a recommended link to an explanation of the algorithm
employed?


Thanks,
Tal



snipped various sigs



My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
http://www.r-statistics.com/
http://www.talgalili.com
http://www.biostatistics.co.il



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to