My reservations about the methodology aside, it's probably not a bad idea to include an error checking line for the case when the probability of the second event is 0 (and so, unsurprisingly, the chi-sq test rejects the null hypothesis) to look at things like the first line:
Try this: Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0, 0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L ), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), c("V1", "V2", "V3", "V4", "W1", "W2", "W3", "W4"))) Fnc <- function(Y) { Y1 = Y[1:4] Y2 = Y[-(1:4)] id = order(Y1,decreasing=T)[1:2] if (any(Y2[id]==0)) {return(NA)} p = chisq.test(Y1[id],p = Y2[id]/sum(Y2[id]))$p.value return(p) } Res = apply(Y,1,Fnc) #You could even pre-calculate the indices to speed it up a little id = apply(Y[,1:4],1,order,decreasing=T)[1:2,] Y.Id = cbind(Y,t(id)) Fnc2 <- function(Y) { Y1 = Y[1:4] Y2 = Y[5:8] id = Y[9:10] if (any(Y2[id]==0)) {return(NA)} p = chisq.test(Y1[id],p = Y2[id]/sum(Y2[id]))$p.value return(p) } Res2 = apply(Y.Id,1,Fnc2) > identical(Res,Res2) TRUE Hope this helps, Michael On Thu, Aug 18, 2011 at 4:16 AM, Petr PIKAL <petr.pi...@precheza.cz> wrote: > Hi > > r-help-boun...@r-project.org napsal dne 17.08.2011 21:07:43: > > > > > Dear Michael, > > > > Thanks a lot for your reply and for your help.I was struggling so much > but > > your suggestion showed me a path to the solution of my problem.I have > > tried your code on my data frame step wise and it looks fine to me.But > > when i tried chi square test- > > > > res=chisq.test(y1[id],p=y2[id],rescale.p=T) > > > > Chi-squared test for given probabilities > > > > data: y1[id] > > X-squared = NaN, df = 19997, p-value = NA > > > > Warning message: > > In chisq.test(y1[id], p = y2[id], rescale.p = T) : > > Chi-squared approximation may be incorrect > > Check what Y1[id] is. > > Split Yn to lists > l1<-split(Y1[id], rep(1:6, each=2)) > l2<-split(Y2[id], rep(1:6, each=2)) > > do mapply on those list. But the result is rather silly as Michael pointed > out. > > mapply(chisq.test, l1, l2, SIMPLIFY=F) > > or to get only p values > > lapply(mapply(chisq.test, l1, l2, SIMPLIFY=F),"[", 3) > > Regards > Petr > > > > > It is not giving p value.Then i checked observed and expected values,it > is > > taking all numbers under consideration.but as i mentioned earlier i want > p > > value for each row and therefore degree of freedom will be 1. example- > > > > I have a data frame with 8 columns- > > V1 V2 V3 V4 W1 W2 W3 W4 > > 1 0 84 22 10 0 84 0 0 > > 2 35 84 0 0 22 84 0 0 > > 3 0 0 0 48 0 0 0 48 > > 4 0 48 0 0 0 48 0 0 > > 5 0 84 0 0 0 84 0 0 > > 6 0 0 0 48 0 0 0 48 > > > > example for first row is- > > > > first two largest values are 84(in V2) and 22 (in V3).so these are > > considered as observed values.Now if the largest values are in V2 and > > V3,we have to pick expected values from W2 and W3 which are 84 and 0.I > > know for chi square test values should not be 0 but we will ignore the > warning. > > > > now it should generate p value for next row taking 35 and 84 (v1 and v2) > > > as observed and 22 and 84 (w1 and w2) as expected.so here it will do chi > > > square test for all 6 rows and will generate 6 p values.My data frame > has > > lot of rows(approx. 9999). > > > > Can you please help me with this. > > > > > > > > Thanking you, > > Warm Regards > > Vikas Bansal > > Msc Bioinformatics > > Kings College London > > ________________________________________ > > From: R. Michael Weylandt [michael.weyla...@gmail.com] > > Sent: Wednesday, August 17, 2011 7:11 PM > > To: Bansal, Vikas > > Cc: r-help@r-project.org > > Subject: Re: [R] Chi square test on data frame > > > > I think everything below is right, but it's all a little helter-skelter > so > > take it with a grain of salt: > > > > First things first, make your data with dput() for the list. > > > > Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0, > > 0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48, > > 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L > > ), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), c("V1", > > "V2", "V3", "V4", "W1", "W2", "W3", "W4"))) > > > > Now, > > > > Y1 = Y[,1:4] > > Y2 = Y[,-(1:4)] > > > > id = apply(Y1,1,order,decreasing=T)[1:2,] > > # This has the columns you want in each row, but it's not directly > > appropriate for subsetting > > # Specifically, the problem is that the row information is implicit in > > where the col index is in id > > # We directly extract and force into a 2-col vector that gives rows and > > columns for each data point > > id = cbind(as.vector(col(id)),as.vector(id)) > > > > Now you can take > > > > Y1[id] as the observed values and Y2[id] as the expected. > > > > But, to be honest, it sounds like you have more problems in using a > chi-sq > > test than anything else. Beyond all the zeros, you should note that you > > always have #obs >= #expected because Y1>= Y2. I'll leave that up to you > though. > > > > Hope this helps and please make sure you can take my code apart piece by > > > piece to understand it: there's some odd data manipulation that takes > > advantage of R's way of coercing matrices to vectors and if your actual > > data isn't like the provided example, you may have to modify. > > > > Michael Weylandt > > > > On Wed, Aug 17, 2011 at 10:26 AM, Bansal, Vikas <vikas.ban...@kcl.ac.uk< > > mailto:vikas.ban...@kcl.ac.uk>> wrote: > > Is there anyone who can help me with chi square test on data frame.I am > > struggling from last 2 days.I will be very thankful to you. > > > > Dear all, > > > > I have been working on this problem from so many hours but did not find > > any solution. > > I have a data frame with 8 columns- > > V1 V2 V3 V4 W1 W2 W3 W4 > > 1 0 84 22 10 0 84 0 0 > > 2 35 84 0 0 22 84 0 0 > > 3 0 0 0 48 0 0 0 48 > > 4 0 48 0 0 0 48 0 0 > > 5 0 84 0 0 0 84 0 0 > > 6 0 0 0 48 0 0 0 48 > > > > from first four columns, for each row I have to take two largest values. > > > and these two values will be considered as observed values.And from last > > > four column we will get the expected values.So i have to perform chi > > square test for each row to get p values. > > > > example for first row is- > > > > first two largest values are 84(in V2) and 22 (in V3).so these are > > considered as observed values.Now if the largest values are in V2 and > > V3,we have to pick expected values from W2 and W3 which are 84 and 0.I > > know for chi square test values should not be 0 but we will ignore the > warning. > > Now as we have observed value as well as expected we have to perform chi > > > square test to get p values for each row in a new column. > > > > > > So far I was working as returning the index for two largest value with- > > sort.int<http://sort.int>(df,index.return=TRUE)$ix[c(4,3)] > > but it does not accept data frame. > > > > Can you please give some idea how to do this,because it is very tricky > and > > after studying a lot, I am not able to perform.Please help. > > > > > > > > Thanking you, > > Warm Regards > > Vikas Bansal > > Msc Bioinformatics > > Kings College London > > ______________________________________________ > > R-help@r-project.org<mailto:R-help@r-project.org> mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help@r-project.org<mailto:R-help@r-project.org> mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.