Hi Tal, it is necessary to understand the method, but not for all applied researchers it is necessary, I think, to understand all the specifics or underlying processes (I know others would disagree on that). As to my suggestion: The Monte-Carlo simulations have no other purpose than showing that the method I suggested yields unbiased estimates, i.e., the coefficients of the fixed effect(s) are not biased compared to their true value. If they were biased, say, you would get an effect of the "test" variable of 3 on average over the Monte-Carlo simulations even though the true coefficient is 2, then the estimate would be biased and, thus, garbage. It would tell you that the method is inappropriate to analyze your data. However, since the linear mixed effect model brings about the true coefficient, it tells you that it is okay to use it. As to the mixed effects model itself (the first part of my response), you assume that Probability(answer=yes) = exp(b*test+random.effect(i))/(1+exp(b*test+random.effect(i))) The modeling of this nonlinear shapes makes sure that values outside the interval 0/1 cannot occur. The model predicts the probability with which an individual is expected to give yes as an answer to the respective question. You are interested in the estimate for "b," i.e., how does "test" affect the probability to receive yes as an answer. And this is what the mixed effects model gives you, respecting that there is assumed random normal variation among the subject (in other words, respecting that some are just more likely than others to answer yes). But Robert has pointed out many valuable alternative approaches. Best, Daniel ------------------------- cuncta stricte discussurus -------------------------
_____ Von: Tal Galili [mailto:tal.gal...@gmail.com] Gesendet: Friday, July 24, 2009 7:07 PM An: Robert A LaBudde Cc: Daniel Malter; r-help@r-project.org Betreff: Re: [R] How to test frequency independence (in a 2 by 2 table) withmany missing values Hello dear Robert and Daniel, I read your replies with much interest, thank you very much for them! Dear Daniel, Your approach is, as Robert said, "whopped me alongside the head" as well. So much so that I fear it will be much more then my current level of understanding can take for the project I have at hand. For the time being (and as much as it saddens me) I will not pursue this as rigorously as it deserves, and hope to look at it again after I will do my homework of studying mixed model do the degree of understanding your reply. I will thank you for mentioning the use of Monte-Carlo, since although I had the chance of working with it on other situations, somehow it didn't occur to me to just have a go at it for the question I brought up in this e-mail. So thank you again for that, and for the very interesting response. Dear Robert, With response to your comments in the second e-mail, here are a few questions: 1) Do you think I could do LRT (based on ML) by using prop.test on the marginals of the tables? I have to admit I am not fully sure how this is different then using the fisher.test on the marginals of the tables (I have the feeling this is something very basic of which I am showing my ignorance about, so I apologize for that). Regarding the resampling alternative, that I am not sure how to try and do. Could you give me a a pointer or two on how to try and use the survival package for my situation? (I don't have any experience using that package for this purpose, and some general direction would be of great help). 2) I had the feeling that the mcnemar test would be a poor choice, which is what led me to writing to the help group :) 3) Regarding the assumptions of the mixed models, that is in part what "scares" me. And since I (for now) don't have experience using these models, I am not so keen on taking them on just yet. 4) I didn't think about using Bayesian methods for it. I admit I never tried them in R. If you have a name or two to drop me, I would love to go through it, at least for the educational value of it. 5) Doing this was exactly my thought, but I wasn't sure how much this would be valid. It sounds from what you wrote that it might be. So what I will do (assuming I will receive a bit of help on some of the tests you brought up) is to try and compare this method in section "5)" you wrote to the method(s) in "1)" (using Monte-Carlo) , so to see which one might work better. 6) I am sorry to say that the data I am analyzing wouldn't give me concordant pairs. But regarding your question - I would love to be able to find a solution for "how to obtain a valid confidence interval for the ratio of proportions for the two tests". But don't know where to even begin on that one. Any insights would be of help. Both Robert and Daniel, I am amazed at the depth and length of your responses and would like to thank you again for your generosity. With much regards, Tal On Fri, Jul 24, 2009 at 11:43 PM, Robert A LaBudde <r...@lcfltd.com> wrote: Daniel's response about using a mixed model "whopped me alongside the head", so now I'm thinking more clearly. Here are a few more comments: 1. The LRT based on ML or the resampling alternative is probably the most powerful among reasonable algorithms to do a test on your problem. You might even get a survival package to do it for you. 2. The McNemar test with casewise deletion should work well enough, so long as you don't lose too much data (e.g., less than 50%). Unfortunately, this seems to be your situation. 3. The mixed model (or, more properly, the generalized mixed model) approach will involve a number of new assumptions not necessary in more direct methods. 4. A Bayesian approach would be better than 1) above if you've got some useful prior information. 5. If you have a lot of missing data (> 50%), as you appear to have from your example, you generally will get more power than the McNemar test from just ignoring the matching on subjects, treating it as unpaired data (2 independent samples). Not as good as method 1), but the high missing rate would justify the independence assumption, as it would reduce severely whatever correlation is present on the same vs. different subjects. The extra data may be worth more than the pairing, particularly if there is a large percentage of concordant pairs in the McNemar test of symmetry. 6. Finally, why are you even doing a test? Tests don't provide physically useful information, only that your study has enough subjects to statistically detect an effect or not. Wouldn't you be better off asking how to obtain a valid confidence interval for the difference or ratio of proportions for the two tests? The concordant pairs would then be useful. At 03:17 PM 7/24/2009, Daniel Malter wrote: Hi Tal, you can use the lme4 library and use a random effects model. I will walk you through the example below (though, generally you should ask a statistician at your school about this). At the very end, I will include a loop for Monte-Carlo simulations that shows that the estimation of the fixed effects is unbiased, which assures you that you are estimating the "correct" coefficient. In addition, as Robert says, you should remove all subjects for which both observations are missing. ##First, we simulate some data. ##Note how the data structure matches the data structure you have: id=rep(1:100,each=2) #the subject ID test=rep(0:1,100) # first or second measurement/test (your variable of interest) randeff=rep(rnorm(100,0,1),each=2) # a random effect for each subject linear.predictor=randeff+2*test #the linear predictor #note the coefficient on the fixed effect of "test" is 2 probablty=exp(linear.predictor)/(1+exp(linear.predictor)) #compute the probability using the linear predictor y=rbinom(200,1,probablty) #draw 0/1 dependent variables #with probability according to the variable probablty miss=sample(1:200,20,replace=FALSE) #some indices for missing variables y[miss]=NA #replace the dependent variable with NAs for the "miss"ing indices ##Some additional data preparation id=factor(id) #make sure id is coded as a factor/dummy mydata=data.frame(y,test,id) #bind the data you need for estimation in a dataframe ##Run the analysis library(lme4) reg=lmer(y~test+(1|id),binomial,data=mydata) summary(reg) ##And here are the Monte-Carlo simulations library(lme4) bootstraps=200 estim=matrix(0,nrow=bootstraps,ncol=2) for(i in 1:bootstraps){ id=rep(1:100,each=2) test=rep(0:1,100) randeff=rep(rnorm(100,0,1),each=2) linear.predictor=randeff+2*test probablty=exp(linear.predictor)/(1+exp(linear.predictor)) y=rbinom(200,1,probablty) miss=sample(1:200,20,replace=FALSE) y[miss]=NA id=factor(id) mydata=data.frame(y,test,id) reg=lmer(y~test+(1|id),binomial,data=mydata) estim[i,]=...@fixef } mean(estim[,1]) #mean estimate of the intercept from 200 MC-simulations #should be close to zero (no intercept in the linear.predictor) mean(estim[,2]) #mean estimate of the 'test' fixed effect from 200 MC-simulations #should be close to 2 (as in the linear.predictor) #plot estimated coefficients from 200 MC simulations par(mfcol=c(1,2)) hist(estim[,1],main="Intercept") hist(estim[,2],main="Effect of variable 'test'") HTH, Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Ursprüngliche Nachricht----- Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im Auftrag von Tal Galili Gesendet: Friday, July 24, 2009 1:23 PM An: r-help@r-project.org Betreff: [R] How to test frequency independence (in a 2 by 2 table) withmany missing values Hello dear R help group. My question is statistical and not R specific, yet I hope some of you might be willing to help. *Experiment settings*: We have a list of subjects. each of them went through two tests with the answer to each can be either 0 or 1. *Goal*: We want to know if the two experiments yielded different results in the subjects answers. *Statistical test (and why it won't work)*: Naturally we would turn to performing a mcnemar test. But here is the twist: we have missing values in our data. For our purpose, let's assume the missingnes is completely at random, and we also have no data to use for imputation. Also, we have much more missing data for experiment number 2 then in experiment number 1. So the question is, under these settings, how do we test for experiment effect on the outcome? So far I have thought of two options: 1) To perform the test only for subjects that has both values. But they are too scarce and will yield low power. 2) To treat the data as independent and do a pearson's chi square test (well, an exact fisher test that is) on all the non-missing data that we have. The problem with this is that our data is not fully independent (which is a prerequisite to chi test, if I understood it correctly). So I am not sure if that is a valid procedure or not. Any insights will be warmly welcomed. p.s: here is an example code producing this scenario. set.seed(102) x1 <- rbinom(100, 1, .5) x2 <- rbinom(100, 1, .3) X <- data.frame(x1,x2) tX <- table(X) margin.table(tX,1) margin.table(tX,2) mcnemar.test(tX) put.missings <- function(x.vector, na.percent) { turn.na <- rbinom(length(x.vector), 1, na.percent) x.vector[turn.na == 1] <- NA return(x.vector) } x1.m <- put.missings(x1, .3) x2.m <- put.missings(x2, .6) tX.m <- rbind(table(na.omit(x1.m)), table(na.omit(x2.m))) fisher.test(tX.m) With regards, Tal -- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire" ================================================================ -- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.