Re: [R] How to test frequency independence (in a 2 by 2 table) withmany missing values

Daniel Malter Fri, 24 Jul 2009 18:29:53 -0700

Hi Tal, it is necessary to understand the method, but not for all applied
researchers it is necessary, I think, to understand all the specifics or
underlying processes (I know others would disagree on that). As to my
suggestion: The Monte-Carlo simulations have no other purpose than showing
that the method I suggested yields unbiased estimates, i.e., the
coefficients of the fixed effect(s) are not biased compared to their true
value. If they were biased, say, you would get an effect of the "test"
variable of 3 on average over the Monte-Carlo simulations even though the
true coefficient is 2, then the estimate would be biased and, thus, garbage.
It would tell you that the method is inappropriate to analyze your data.
However, since the linear mixed effect model brings about the true
coefficient, it tells you that it is okay to use it.
 
As to the mixed effects model itself (the first part of my response), you
assume that
 
Probability(answer=yes) =
exp(b*test+random.effect(i))/(1+exp(b*test+random.effect(i)))
 
The modeling of this nonlinear shapes makes sure that values outside the
interval 0/1 cannot occur. The model predicts the probability with which an
individual is expected to give yes as an answer to the respective question.
You are interested in the estimate for "b," i.e., how does "test" affect the
probability to receive yes as an answer. And this is what the mixed effects
model gives you, respecting that there is assumed random normal variation
among the subject (in other words, respecting that some are just more likely
than others to answer yes).
 
But Robert has pointed out many valuable alternative approaches.
 
Best,
Daniel
 
-------------------------
cuncta stricte discussurus
-------------------------


  _____  

Von: Tal Galili [mailto:tal.gal...@gmail.com] 
Gesendet: Friday, July 24, 2009 7:07 PM
An: Robert A LaBudde
Cc: Daniel Malter; r-help@r-project.org
Betreff: Re: [R] How to test frequency independence (in a 2 by 2 table)
withmany missing values


Hello dear Robert and Daniel,
I read your replies with much interest, thank you very much for them!

Dear Daniel,
Your approach is, as Robert said, "whopped me alongside the head" as well.
So much so that I fear it will be much more then my current level of
understanding can take for the project I have at hand.
For the time being (and as much as it saddens me) I will not pursue this as
rigorously as it deserves, and hope to look at it again after I will do my
homework of studying mixed model do the degree of understanding your reply.
I will thank you for mentioning the use of Monte-Carlo, since although I had
the chance of working with it on other situations, somehow it didn't occur
to me to just have a go at it for the question I brought up in this e-mail.
So thank you again for that, and for the very interesting response.

Dear Robert,
With response to your comments in the second e-mail, here are a few
questions:
1) Do you think I could do LRT (based on ML) by using prop.test on the
marginals of the tables?
I have to admit I am not fully sure how this is different then using the
fisher.test on the marginals of the tables (I have the feeling this is
something very basic of which I am showing my ignorance about, so I
apologize for that).
Regarding the resampling alternative, that I am not sure how to try and do.
Could you give me a a pointer or two on how to try and use the survival
package for my situation? (I don't have any experience using that package
for this purpose, and some general direction would be of great help).
2) I had the feeling that the mcnemar test would be a poor choice, which is
what led me to writing to the help group :)
3) Regarding the assumptions of the mixed models, that is in part what
"scares" me. And since I (for now) don't have experience using these models,
I am not so keen on taking them on just yet.
4) I didn't think about using Bayesian methods for it. I admit I never tried
them in R. If you have a name or two to drop me, I would love to go through
it, at least for the educational value of it.
5) Doing this was exactly my thought, but I wasn't sure how much this would
be valid. It sounds from what you wrote that it might be. So what I will do
(assuming I will receive a bit of help on some of the tests you brought up)
is to try and compare this method in section "5)" you wrote to the method(s)
in "1)"  (using Monte-Carlo) , so to see which one might work better.
6) I am sorry to say that the data I am analyzing wouldn't give me
concordant pairs. But regarding your question - I would love to be able to
find a solution for "how to obtain a valid confidence interval for the ratio
of proportions for the two tests". But don't know where to even begin on
that one. Any insights would be of help.




Both Robert and Daniel, I am amazed at the depth and length of your
responses and would like to thank you again for your generosity.
With much regards,
Tal




 
 




On Fri, Jul 24, 2009 at 11:43 PM, Robert A LaBudde <r...@lcfltd.com> wrote:


Daniel's response about using a mixed model "whopped me alongside the head",
so now I'm thinking more clearly. Here are a few more comments:

1. The LRT based on ML or the resampling alternative is probably the most
powerful among reasonable algorithms to do a test on your problem. You might
even get a survival package to do it for you.

2. The McNemar test with casewise deletion should work well enough, so long
as you don't lose too much data (e.g., less than 50%). Unfortunately, this
seems to be your situation.

3. The mixed model (or, more properly, the generalized mixed model) approach
will involve a number of new assumptions not necessary in more direct
methods.

4. A Bayesian approach would be better than 1) above if you've got some
useful prior information.

5. If you have a lot of missing data (> 50%), as you appear to have from
your example,   you generally will get more power than the McNemar test from
just ignoring the matching on subjects, treating it as unpaired data (2
independent samples). Not as good as method 1), but the high missing rate
would justify the independence assumption, as it would reduce severely
whatever correlation is present on the same vs. different subjects. The
extra data may be worth more than the pairing, particularly if there is a
large percentage of concordant pairs in the McNemar test of symmetry.

6. Finally, why are you even doing a test? Tests don't provide physically
useful information, only that your study has enough subjects to
statistically detect an effect or not. Wouldn't you be better off asking how
to obtain a valid confidence interval for the difference or ratio of
proportions for the two tests? The concordant pairs would then be useful. 



At 03:17 PM 7/24/2009, Daniel Malter wrote:



Hi Tal, you can use the lme4 library and use a random effects model. I will
walk you through the example below (though, generally you should ask a
statistician at your school about this). At the very end, I will include a
loop for Monte-Carlo simulations that shows that the estimation of the fixed
effects is unbiased, which assures you that you are estimating the "correct"
coefficient. In addition, as Robert says, you should remove all subjects for
which both observations are missing.

##First, we simulate some data.
##Note how the data structure matches the data structure you have:

id=rep(1:100,each=2)
#the subject ID

test=rep(0:1,100)
# first or second measurement/test (your variable of interest)

randeff=rep(rnorm(100,0,1),each=2)
# a random effect for each subject

linear.predictor=randeff+2*test
#the linear predictor
#note the coefficient on the fixed effect of "test" is 2

probablty=exp(linear.predictor)/(1+exp(linear.predictor))
#compute the probability using the linear predictor

y=rbinom(200,1,probablty)
#draw 0/1 dependent variables
#with probability according to the variable probablty

miss=sample(1:200,20,replace=FALSE)
#some indices for missing variables

y[miss]=NA
#replace the dependent variable with NAs for the "miss"ing indices


##Some additional data preparation

id=factor(id)
#make sure id is coded as a factor/dummy

mydata=data.frame(y,test,id)
#bind the data you need for estimation in a dataframe


##Run the analysis

library(lme4)
reg=lmer(y~test+(1|id),binomial,data=mydata)
summary(reg)



##And here are the Monte-Carlo simulations

library(lme4)

bootstraps=200
estim=matrix(0,nrow=bootstraps,ncol=2)

for(i in 1:bootstraps){
id=rep(1:100,each=2)
test=rep(0:1,100)
randeff=rep(rnorm(100,0,1),each=2)
linear.predictor=randeff+2*test
probablty=exp(linear.predictor)/(1+exp(linear.predictor))
y=rbinom(200,1,probablty)
miss=sample(1:200,20,replace=FALSE)
y[miss]=NA
id=factor(id)
mydata=data.frame(y,test,id)
reg=lmer(y~test+(1|id),binomial,data=mydata)
estim[i,]=...@fixef
}

mean(estim[,1])
#mean estimate of the intercept from 200 MC-simulations
#should be close to zero (no intercept in the linear.predictor)

mean(estim[,2])
#mean estimate of the 'test' fixed effect from 200 MC-simulations
#should be close to 2 (as in the linear.predictor)

#plot estimated coefficients from 200 MC simulations
par(mfcol=c(1,2))
hist(estim[,1],main="Intercept")
hist(estim[,2],main="Effect of variable 'test'")

HTH,
Daniel

-------------------------
cuncta stricte discussurus
-------------------------

-----Ursprüngliche Nachricht-----
Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im
Auftrag von Tal Galili
Gesendet: Friday, July 24, 2009 1:23 PM
An: r-help@r-project.org
Betreff: [R] How to test frequency independence (in a 2 by 2 table) withmany
missing values

Hello dear R help group.

My question is statistical and not R specific, yet I hope some of you might
be willing to help.

*Experiment settings*:  We have a list of subjects. each of them went
through two tests with the answer to each can be either 0 or 1.
*Goal*: We want to know if the two experiments yielded different results in
the subjects answers.
*Statistical test (and why it won't work)*: Naturally we would turn to
performing a mcnemar test. But here is the twist: we have missing values in
our data.
For our purpose, let's assume the missingnes is completely at random, and we
also have no data to use for imputation. Also, we have much more missing
data for experiment number 2 then in experiment number 1.

So the question is, under these settings, how do we test for experiment
effect on the outcome?

So far I have thought of two options:
1) To perform the test only for subjects that has both values. But they are
too scarce and will yield low power.
2) To treat the data as independent and do a pearson's chi square test
(well, an exact fisher test that is) on all the non-missing data that we
have. The problem with this is that our data is not fully independent (which
is a prerequisite to chi test, if I understood it correctly). So I am not
sure if that is a valid procedure or not.

Any insights will be warmly welcomed.


p.s: here is an example code producing this scenario.

set.seed(102)

x1 <- rbinom(100, 1, .5)
x2 <- rbinom(100, 1, .3)

X <- data.frame(x1,x2)
tX <- table(X)
margin.table(tX,1)
margin.table(tX,2)
mcnemar.test(tX)

put.missings <- function(x.vector, na.percent) { turn.na <-
rbinom(length(x.vector), 1, na.percent)  x.vector[turn.na == 1] <- NA
return(x.vector)
}


x1.m <- put.missings(x1, .3)
x2.m <- put.missings(x2, .6)

tX.m <- rbind(table(na.omit(x1.m)), table(na.omit(x2.m)))
fisher.test(tX.m)




With regards,
Tal









--
----------------------------------------------


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
http://www.r-statistics.com/
http://www.talgalili.com
http://www.biostatistics.co.il

       [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
================================================================






-- 
----------------------------------------------


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
http://www.r-statistics.com/
http://www.talgalili.com
http://www.biostatistics.co.il




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to test frequency independence (in a 2 by 2 table) withmany missing values

Reply via email to