Hello Daniel, Thank you for the extended answer. You gave me one more reason to delve into mixed models in the future. I will be honest and say that I believe I will explore Roberts approach more due to the simplicity of it. But nevertheless, I am very grateful for your reply.
With best regards, Tal On Sat, Jul 25, 2009 at 4:26 AM, Daniel Malter <dan...@umd.edu> wrote: > Hi Tal, it is necessary to understand the method, but not for all applied > researchers it is necessary, I think, to understand all the specifics or > underlying processes (I know others would disagree on that). As to my > suggestion: The Monte-Carlo simulations have no other purpose than showing > that the method I suggested yields unbiased estimates, i.e., the > coefficients of the fixed effect(s) are not biased compared to their true > value. If they were biased, say, you would get an effect of the "test" > variable of 3 on average over the Monte-Carlo simulations even though the > true coefficient is 2, then the estimate would be biased and, thus, garbage. > It would tell you that the method is inappropriate to analyze your data. > However, since the linear mixed effect model brings about the true > coefficient, it tells you that it is okay to use it. > > As to the mixed effects model itself (the first part of my response), you > assume that > > Probability(answer=yes) = > exp(b*test+random.effect(i))/(1+exp(b*test+random.effect(i))) > > The modeling of this nonlinear shapes makes sure that values outside the > interval 0/1 cannot occur. The model predicts the probability with which an > individual is expected to give yes as an answer to the respective question. > You are interested in the estimate for "b," i.e., how does "test" affect the > probability to receive yes as an answer. And this is what the mixed effects > model gives you, respecting that there is assumed random normal variation > among the subject (in other words, respecting that some are just more likely > than others to answer yes). > > But Robert has pointed out many valuable alternative approaches. > > Best, > Daniel > > ------------------------- > cuncta stricte discussurus > ------------------------- > > > ------------------------------ > *Von:* Tal Galili [mailto:tal.gal...@gmail.com] > *Gesendet:* Friday, July 24, 2009 7:07 PM > *An:* Robert A LaBudde > *Cc:* Daniel Malter; r-help@r-project.org > *Betreff:* Re: [R] How to test frequency independence (in a 2 by 2 table) > withmany missing values > > Hello dear Robert and Daniel, > I read your replies with much interest, thank you very much for them! > > Dear Daniel, > Your approach is, as Robert said, "whopped me alongside the head" as well. > So much so that I fear it will be much more then my current level of > understanding can take for the project I have at hand. > For the time being (and as much as it saddens me) I will not pursue this > as rigorously as it deserves, and hope to look at it again after I will do > my homework of studying mixed model do the degree of understanding your > reply. > I will thank you for mentioning the use of Monte-Carlo, since although I > had the chance of working with it on other situations, somehow it didn't > occur to me to just have a go at it for the question I brought up in this > e-mail. So thank you again for that, and for the very interesting response. > > Dear Robert, > With response to your comments in the second e-mail, here are a few > questions: > 1) Do you think I could do LRT (based on ML) by using prop.test on the > marginals of the tables? > I have to admit I am not fully sure how this is different then using the > fisher.test on the marginals of the tables (I have the feeling this is > something very basic of which I am showing my ignorance about, so I > apologize for that). > Regarding the resampling alternative, that I am not sure how to try and > do. Could you give me a a pointer or two on how to try and use the > survival package for my situation? (I don't have any experience using that > package for this purpose, and some general direction would be of great > help). > 2) I had the feeling that the mcnemar test would be a poor choice, which is > what led me to writing to the help group :) > 3) Regarding the assumptions of the mixed models, that is in part what > "scares" me. And since I (for now) don't have experience using these models, > I am not so keen on taking them on just yet. > 4) I didn't think about using Bayesian methods for it. I admit I never > tried them in R. If you have a name or two to drop me, I would love to go > through it, at least for the educational value of it. > 5) Doing this was exactly my thought, but I wasn't sure how much this would > be valid. It sounds from what you wrote that it might be. So what I will do > (assuming I will receive a bit of help on some of the tests you brought up) > is to try and compare this method in section "5)" you wrote to the method(s) > in "1)" (using Monte-Carlo) , so to see which one might work better. > 6) I am sorry to say that the data I am analyzing wouldn't give me concordant > pairs. But regarding your question - I would love to be able to find a > solution for "how to obtain a valid confidence interval for the ratio of > proportions for the two tests". But don't know where to even begin on that > one. Any insights would be of help. > > > Both Robert and Daniel, I am amazed at the depth and length of your > responses and would like to thank you again for your generosity. > With much regards, > Tal > > > > > > > > > On Fri, Jul 24, 2009 at 11:43 PM, Robert A LaBudde <r...@lcfltd.com> wrote: > >> Daniel's response about using a mixed model "whopped me alongside the >> head", so now I'm thinking more clearly. Here are a few more comments: >> >> 1. The LRT based on ML or the resampling alternative is probably the most >> powerful among reasonable algorithms to do a test on your problem. You might >> even get a survival package to do it for you. >> >> 2. The McNemar test with casewise deletion should work well enough, so >> long as you don't lose too much data (e.g., less than 50%). Unfortunately, >> this seems to be your situation. >> >> 3. The mixed model (or, more properly, the generalized mixed model) >> approach will involve a number of new assumptions not necessary in more >> direct methods. >> >> 4. A Bayesian approach would be better than 1) above if you've got some >> useful prior information. >> >> 5. If you have a lot of missing data (> 50%), as you appear to have from >> your example, you generally will get more power than the McNemar test from >> just ignoring the matching on subjects, treating it as unpaired data (2 >> independent samples). Not as good as method 1), but the high missing rate >> would justify the independence assumption, as it would reduce severely >> whatever correlation is present on the same vs. different subjects. The >> extra data may be worth more than the pairing, particularly if there is a >> large percentage of concordant pairs in the McNemar test of symmetry. >> >> 6. Finally, why are you even doing a test? Tests don't provide physically >> useful information, only that your study has enough subjects to >> statistically detect an effect or not. Wouldn't you be better off asking how >> to obtain a valid confidence interval for the difference or ratio of >> proportions for the two tests? The concordant pairs would then be useful. >> >> >> >> At 03:17 PM 7/24/2009, Daniel Malter wrote: >> >>> >>> Hi Tal, you can use the lme4 library and use a random effects model. I >>> will >>> walk you through the example below (though, generally you should ask a >>> statistician at your school about this). At the very end, I will include >>> a >>> loop for Monte-Carlo simulations that shows that the estimation of the >>> fixed >>> effects is unbiased, which assures you that you are estimating the >>> "correct" >>> coefficient. In addition, as Robert says, you should remove all subjects >>> for >>> which both observations are missing. >>> >>> ##First, we simulate some data. >>> ##Note how the data structure matches the data structure you have: >>> >>> id=rep(1:100,each=2) >>> #the subject ID >>> >>> test=rep(0:1,100) >>> # first or second measurement/test (your variable of interest) >>> >>> randeff=rep(rnorm(100,0,1),each=2) >>> # a random effect for each subject >>> >>> linear.predictor=randeff+2*test >>> #the linear predictor >>> #note the coefficient on the fixed effect of "test" is 2 >>> >>> probablty=exp(linear.predictor)/(1+exp(linear.predictor)) >>> #compute the probability using the linear predictor >>> >>> y=rbinom(200,1,probablty) >>> #draw 0/1 dependent variables >>> #with probability according to the variable probablty >>> >>> miss=sample(1:200,20,replace=FALSE) >>> #some indices for missing variables >>> >>> y[miss]=NA >>> #replace the dependent variable with NAs for the "miss"ing indices >>> >>> >>> ##Some additional data preparation >>> >>> id=factor(id) >>> #make sure id is coded as a factor/dummy >>> >>> mydata=data.frame(y,test,id) >>> #bind the data you need for estimation in a dataframe >>> >>> >>> ##Run the analysis >>> >>> library(lme4) >>> reg=lmer(y~test+(1|id),binomial,data=mydata) >>> summary(reg) >>> >>> >>> >>> ##And here are the Monte-Carlo simulations >>> >>> library(lme4) >>> >>> bootstraps=200 >>> estim=matrix(0,nrow=bootstraps,ncol=2) >>> >>> for(i in 1:bootstraps){ >>> id=rep(1:100,each=2) >>> test=rep(0:1,100) >>> randeff=rep(rnorm(100,0,1),each=2) >>> linear.predictor=randeff+2*test >>> probablty=exp(linear.predictor)/(1+exp(linear.predictor)) >>> y=rbinom(200,1,probablty) >>> miss=sample(1:200,20,replace=FALSE) >>> y[miss]=NA >>> id=factor(id) >>> mydata=data.frame(y,test,id) >>> reg=lmer(y~test+(1|id),binomial,data=mydata) >>> estim[i,]=...@fixef >>> } >>> >>> mean(estim[,1]) >>> #mean estimate of the intercept from 200 MC-simulations >>> #should be close to zero (no intercept in the linear.predictor) >>> >>> mean(estim[,2]) >>> #mean estimate of the 'test' fixed effect from 200 MC-simulations >>> #should be close to 2 (as in the linear.predictor) >>> >>> #plot estimated coefficients from 200 MC simulations >>> par(mfcol=c(1,2)) >>> hist(estim[,1],main="Intercept") >>> hist(estim[,2],main="Effect of variable 'test'") >>> >>> HTH, >>> Daniel >>> >>> ------------------------- >>> cuncta stricte discussurus >>> ------------------------- >>> >>> -----Ursprüngliche Nachricht----- >>> Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >>> Im >>> Auftrag von Tal Galili >>> Gesendet: Friday, July 24, 2009 1:23 PM >>> An: r-help@r-project.org >>> Betreff: [R] How to test frequency independence (in a 2 by 2 table) >>> withmany >>> missing values >>> >>> Hello dear R help group. >>> >>> My question is statistical and not R specific, yet I hope some of you >>> might >>> be willing to help. >>> >>> *Experiment settings*: We have a list of subjects. each of them went >>> through two tests with the answer to each can be either 0 or 1. >>> *Goal*: We want to know if the two experiments yielded different results >>> in >>> the subjects answers. >>> *Statistical test (and why it won't work)*: Naturally we would turn to >>> performing a mcnemar test. But here is the twist: we have missing values >>> in >>> our data. >>> For our purpose, let's assume the missingnes is completely at random, and >>> we >>> also have no data to use for imputation. Also, we have much more missing >>> data for experiment number 2 then in experiment number 1. >>> >>> So the question is, under these settings, how do we test for experiment >>> effect on the outcome? >>> >>> So far I have thought of two options: >>> 1) To perform the test only for subjects that has both values. But they >>> are >>> too scarce and will yield low power. >>> 2) To treat the data as independent and do a pearson's chi square test >>> (well, an exact fisher test that is) on all the non-missing data that we >>> have. The problem with this is that our data is not fully independent >>> (which >>> is a prerequisite to chi test, if I understood it correctly). So I am not >>> sure if that is a valid procedure or not. >>> >>> Any insights will be warmly welcomed. >>> >>> >>> p.s: here is an example code producing this scenario. >>> >>> set.seed(102) >>> >>> x1 <- rbinom(100, 1, .5) >>> x2 <- rbinom(100, 1, .3) >>> >>> X <- data.frame(x1,x2) >>> tX <- table(X) >>> margin.table(tX,1) >>> margin.table(tX,2) >>> mcnemar.test(tX) >>> >>> put.missings <- function(x.vector, na.percent) { turn.na <- >>> rbinom(length(x.vector), 1, na.percent) x.vector[turn.na == 1] <- NA >>> return(x.vector) >>> } >>> >>> >>> x1.m <- put.missings(x1, .3) >>> x2.m <- put.missings(x2, .6) >>> >>> tX.m <- rbind(table(na.omit(x1.m)), table(na.omit(x2.m))) >>> fisher.test(tX.m) >>> >>> >>> >>> >>> With regards, >>> Tal >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> ---------------------------------------------- >>> >>> >>> My contact information: >>> Tal Galili >>> Phone number: 972-50-3373767 >>> FaceBook: Tal Galili >>> My Blogs: >>> http://www.r-statistics.com/ >>> http://www.talgalili.com >>> http://www.biostatistics.co.il >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> ================================================================ >> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com >> Least Cost Formulations, Ltd. URL: http://lcfltd.com/ >> 824 Timberlake Drive Tel: 757-467-0954 >> Virginia Beach, VA 23464-3239 Fax: 757-467-2947 >> >> "Vere scire est per causas scire" >> ================================================================ >> >> > > > -- > ---------------------------------------------- > > > My contact information: > Tal Galili > Phone number: 972-50-3373767 > FaceBook: Tal Galili > My Blogs: > http://www.r-statistics.com/ > http://www.talgalili.com > http://www.biostatistics.co.il > > > -- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.