Re: [R] How to test frequency independence (in a 2 by 2 table) withmany missing values

Tal Galili Sat, 25 Jul 2009 01:43:16 -0700

Hello Daniel,

Thank you for the extended answer. You gave me one more reason to delve into
mixed models in the future.
I will be honest and say that I believe I will explore Roberts approach more
due to the simplicity of it. But nevertheless, I am very grateful for your
reply.


With best regards,
Tal






On Sat, Jul 25, 2009 at 4:26 AM, Daniel Malter <dan...@umd.edu> wrote:

>  Hi Tal, it is necessary to understand the method, but not for all applied
> researchers it is necessary, I think, to understand all the specifics or
> underlying processes (I know others would disagree on that). As to my
> suggestion: The Monte-Carlo simulations have no other purpose than showing
> that the method I suggested yields unbiased estimates, i.e., the
> coefficients of the fixed effect(s) are not biased compared to their true
> value. If they were biased, say, you would get an effect of the "test"
> variable of 3 on average over the Monte-Carlo simulations even though the
> true coefficient is 2, then the estimate would be biased and, thus, garbage.
> It would tell you that the method is inappropriate to analyze your data.
> However, since the linear mixed effect model brings about the true
> coefficient, it tells you that it is okay to use it.
>
> As to the mixed effects model itself (the first part of my response), you
> assume that
>
> Probability(answer=yes) =
> exp(b*test+random.effect(i))/(1+exp(b*test+random.effect(i)))
>
> The modeling of this nonlinear shapes makes sure that values outside the
> interval 0/1 cannot occur. The model predicts the probability with which an
> individual is expected to give yes as an answer to the respective question.
> You are interested in the estimate for "b," i.e., how does "test" affect the
> probability to receive yes as an answer. And this is what the mixed effects
> model gives you, respecting that there is assumed random normal variation
> among the subject (in other words, respecting that some are just more likely
> than others to answer yes).
>
> But Robert has pointed out many valuable alternative approaches.
>
> Best,
> Daniel
>
> -------------------------
> cuncta stricte discussurus
> -------------------------
>
>
>  ------------------------------
> *Von:* Tal Galili [mailto:tal.gal...@gmail.com]
> *Gesendet:* Friday, July 24, 2009 7:07 PM
> *An:* Robert A LaBudde
> *Cc:* Daniel Malter; r-help@r-project.org
> *Betreff:* Re: [R] How to test frequency independence (in a 2 by 2 table)
> withmany missing values
>
>  Hello dear Robert and Daniel,
> I read your replies with much interest, thank you very much for them!
>
> Dear Daniel,
> Your approach is, as Robert said, "whopped me alongside the head" as well.
> So much so that I fear it will be much more then my current level of
> understanding can take for the project I have at hand.
> For the time being (and as much as it saddens me) I will not pursue this
> as rigorously as it deserves, and hope to look at it again after I will do
> my homework of studying mixed model do the degree of understanding your
> reply.
> I will thank you for mentioning the use of Monte-Carlo, since although I
> had the chance of working with it on other situations, somehow it didn't
> occur to me to just have a go at it for the question I brought up in this
> e-mail. So thank you again for that, and for the very interesting response.
>
> Dear Robert,
> With response to your comments in the second e-mail, here are a few
> questions:
> 1) Do you think I could do LRT (based on ML) by using prop.test on the
> marginals of the tables?
> I have to admit I am not fully sure how this is different then using the
> fisher.test on the marginals of the tables (I have the feeling this is
> something very basic of which I am showing my ignorance about, so I
> apologize for that).
> Regarding the resampling alternative, that I am not sure how to try and
> do. Could you give me a a pointer or two on how to try and use the
> survival package for my situation? (I don't have any experience using that
> package for this purpose, and some general direction would be of great
> help).
> 2) I had the feeling that the mcnemar test would be a poor choice, which is
> what led me to writing to the help group :)
> 3) Regarding the assumptions of the mixed models, that is in part what
> "scares" me. And since I (for now) don't have experience using these models,
> I am not so keen on taking them on just yet.
> 4) I didn't think about using Bayesian methods for it. I admit I never
> tried them in R. If you have a name or two to drop me, I would love to go
> through it, at least for the educational value of it.
> 5) Doing this was exactly my thought, but I wasn't sure how much this would
> be valid. It sounds from what you wrote that it might be. So what I will do
> (assuming I will receive a bit of help on some of the tests you brought up)
> is to try and compare this method in section "5)" you wrote to the method(s)
> in "1)"  (using Monte-Carlo) , so to see which one might work better.
> 6) I am sorry to say that the data I am analyzing wouldn't give me concordant
> pairs. But regarding your question - I would love to be able to find a
> solution for "how to obtain a valid confidence interval for the ratio of
> proportions for the two tests". But don't know where to even begin on that
> one. Any insights would be of help.
>
>
> Both Robert and Daniel, I am amazed at the depth and length of your
> responses and would like to thank you again for your generosity.
> With much regards,
> Tal
>
>
>
>
>
>
>
>
> On Fri, Jul 24, 2009 at 11:43 PM, Robert A LaBudde <r...@lcfltd.com> wrote:
>
>> Daniel's response about using a mixed model "whopped me alongside the
>> head", so now I'm thinking more clearly. Here are a few more comments:
>>
>> 1. The LRT based on ML or the resampling alternative is probably the most
>> powerful among reasonable algorithms to do a test on your problem. You might
>> even get a survival package to do it for you.
>>
>> 2. The McNemar test with casewise deletion should work well enough, so
>> long as you don't lose too much data (e.g., less than 50%). Unfortunately,
>> this seems to be your situation.
>>
>> 3. The mixed model (or, more properly, the generalized mixed model)
>> approach will involve a number of new assumptions not necessary in more
>> direct methods.
>>
>> 4. A Bayesian approach would be better than 1) above if you've got some
>> useful prior information.
>>
>> 5. If you have a lot of missing data (> 50%), as you appear to have from
>> your example,   you generally will get more power than the McNemar test from
>> just ignoring the matching on subjects, treating it as unpaired data (2
>> independent samples). Not as good as method 1), but the high missing rate
>> would justify the independence assumption, as it would reduce severely
>> whatever correlation is present on the same vs. different subjects. The
>> extra data may be worth more than the pairing, particularly if there is a
>> large percentage of concordant pairs in the McNemar test of symmetry.
>>
>> 6. Finally, why are you even doing a test? Tests don't provide physically
>> useful information, only that your study has enough subjects to
>> statistically detect an effect or not. Wouldn't you be better off asking how
>> to obtain a valid confidence interval for the difference or ratio of
>> proportions for the two tests? The concordant pairs would then be useful.
>>
>>
>>
>> At 03:17 PM 7/24/2009, Daniel Malter wrote:
>>
>>>
>>> Hi Tal, you can use the lme4 library and use a random effects model. I
>>> will
>>> walk you through the example below (though, generally you should ask a
>>> statistician at your school about this). At the very end, I will include
>>> a
>>> loop for Monte-Carlo simulations that shows that the estimation of the
>>> fixed
>>> effects is unbiased, which assures you that you are estimating the
>>> "correct"
>>> coefficient. In addition, as Robert says, you should remove all subjects
>>> for
>>> which both observations are missing.
>>>
>>> ##First, we simulate some data.
>>> ##Note how the data structure matches the data structure you have:
>>>
>>> id=rep(1:100,each=2)
>>> #the subject ID
>>>
>>> test=rep(0:1,100)
>>> # first or second measurement/test (your variable of interest)
>>>
>>> randeff=rep(rnorm(100,0,1),each=2)
>>> # a random effect for each subject
>>>
>>> linear.predictor=randeff+2*test
>>> #the linear predictor
>>> #note the coefficient on the fixed effect of "test" is 2
>>>
>>> probablty=exp(linear.predictor)/(1+exp(linear.predictor))
>>> #compute the probability using the linear predictor
>>>
>>> y=rbinom(200,1,probablty)
>>> #draw 0/1 dependent variables
>>> #with probability according to the variable probablty
>>>
>>> miss=sample(1:200,20,replace=FALSE)
>>> #some indices for missing variables
>>>
>>> y[miss]=NA
>>> #replace the dependent variable with NAs for the "miss"ing indices
>>>
>>>
>>> ##Some additional data preparation
>>>
>>> id=factor(id)
>>> #make sure id is coded as a factor/dummy
>>>
>>> mydata=data.frame(y,test,id)
>>> #bind the data you need for estimation in a dataframe
>>>
>>>
>>> ##Run the analysis
>>>
>>> library(lme4)
>>> reg=lmer(y~test+(1|id),binomial,data=mydata)
>>> summary(reg)
>>>
>>>
>>>
>>> ##And here are the Monte-Carlo simulations
>>>
>>> library(lme4)
>>>
>>> bootstraps=200
>>> estim=matrix(0,nrow=bootstraps,ncol=2)
>>>
>>> for(i in 1:bootstraps){
>>> id=rep(1:100,each=2)
>>> test=rep(0:1,100)
>>> randeff=rep(rnorm(100,0,1),each=2)
>>> linear.predictor=randeff+2*test
>>> probablty=exp(linear.predictor)/(1+exp(linear.predictor))
>>> y=rbinom(200,1,probablty)
>>> miss=sample(1:200,20,replace=FALSE)
>>> y[miss]=NA
>>> id=factor(id)
>>> mydata=data.frame(y,test,id)
>>> reg=lmer(y~test+(1|id),binomial,data=mydata)
>>> estim[i,]=...@fixef
>>> }
>>>
>>> mean(estim[,1])
>>> #mean estimate of the intercept from 200 MC-simulations
>>> #should be close to zero (no intercept in the linear.predictor)
>>>
>>> mean(estim[,2])
>>> #mean estimate of the 'test' fixed effect from 200 MC-simulations
>>> #should be close to 2 (as in the linear.predictor)
>>>
>>> #plot estimated coefficients from 200 MC simulations
>>> par(mfcol=c(1,2))
>>> hist(estim[,1],main="Intercept")
>>> hist(estim[,2],main="Effect of variable 'test'")
>>>
>>> HTH,
>>> Daniel
>>>
>>> -------------------------
>>> cuncta stricte discussurus
>>> -------------------------
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>>> Im
>>> Auftrag von Tal Galili
>>> Gesendet: Friday, July 24, 2009 1:23 PM
>>> An: r-help@r-project.org
>>> Betreff: [R] How to test frequency independence (in a 2 by 2 table)
>>> withmany
>>> missing values
>>>
>>> Hello dear R help group.
>>>
>>> My question is statistical and not R specific, yet I hope some of you
>>> might
>>> be willing to help.
>>>
>>> *Experiment settings*:  We have a list of subjects. each of them went
>>> through two tests with the answer to each can be either 0 or 1.
>>> *Goal*: We want to know if the two experiments yielded different results
>>> in
>>> the subjects answers.
>>> *Statistical test (and why it won't work)*: Naturally we would turn to
>>> performing a mcnemar test. But here is the twist: we have missing values
>>> in
>>> our data.
>>> For our purpose, let's assume the missingnes is completely at random, and
>>> we
>>> also have no data to use for imputation. Also, we have much more missing
>>> data for experiment number 2 then in experiment number 1.
>>>
>>> So the question is, under these settings, how do we test for experiment
>>> effect on the outcome?
>>>
>>> So far I have thought of two options:
>>> 1) To perform the test only for subjects that has both values. But they
>>> are
>>> too scarce and will yield low power.
>>> 2) To treat the data as independent and do a pearson's chi square test
>>> (well, an exact fisher test that is) on all the non-missing data that we
>>> have. The problem with this is that our data is not fully independent
>>> (which
>>> is a prerequisite to chi test, if I understood it correctly). So I am not
>>> sure if that is a valid procedure or not.
>>>
>>> Any insights will be warmly welcomed.
>>>
>>>
>>> p.s: here is an example code producing this scenario.
>>>
>>> set.seed(102)
>>>
>>> x1 <- rbinom(100, 1, .5)
>>> x2 <- rbinom(100, 1, .3)
>>>
>>> X <- data.frame(x1,x2)
>>> tX <- table(X)
>>> margin.table(tX,1)
>>> margin.table(tX,2)
>>> mcnemar.test(tX)
>>>
>>> put.missings <- function(x.vector, na.percent) { turn.na <-
>>> rbinom(length(x.vector), 1, na.percent)  x.vector[turn.na == 1] <- NA
>>> return(x.vector)
>>> }
>>>
>>>
>>> x1.m <- put.missings(x1, .3)
>>> x2.m <- put.missings(x2, .6)
>>>
>>> tX.m <- rbind(table(na.omit(x1.m)), table(na.omit(x2.m)))
>>> fisher.test(tX.m)
>>>
>>>
>>>
>>>
>>> With regards,
>>> Tal
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> ----------------------------------------------
>>>
>>>
>>> My contact information:
>>> Tal Galili
>>> Phone number: 972-50-3373767
>>> FaceBook: Tal Galili
>>> My Blogs:
>>> http://www.r-statistics.com/
>>> http://www.talgalili.com
>>> http://www.biostatistics.co.il
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>  ================================================================
>> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
>> Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
>> 824 Timberlake Drive                     Tel: 757-467-0954
>> Virginia Beach, VA 23464-3239            Fax: 757-467-2947
>>
>> "Vere scire est per causas scire"
>> ================================================================
>>
>>
>
>
> --
> ----------------------------------------------
>
>
> My contact information:
> Tal Galili
> Phone number: 972-50-3373767
> FaceBook: Tal Galili
> My Blogs:
> http://www.r-statistics.com/
> http://www.talgalili.com
> http://www.biostatistics.co.il
>
>
>


-- 
----------------------------------------------


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
http://www.r-statistics.com/
http://www.talgalili.com
http://www.biostatistics.co.il

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to test frequency independence (in a 2 by 2 table) withmany missing values

Reply via email to