Re: [R] Chi square test on data frame

R. Michael Weylandt Thu, 18 Aug 2011 06:56:19 -0700

My reservations about the methodology aside, it's probably not a bad idea to
include an error checking line for the case when the probability of the
second event is 0 (and so, unsurprisingly, the chi-sq test rejects the null
hypothesis) to look at things like the first line:

Try this:

Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0,
0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48,
84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L
), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), c("V1",
"V2", "V3", "V4", "W1", "W2", "W3", "W4")))

Fnc <- function(Y) {
    Y1 = Y[1:4]
    Y2 = Y[-(1:4)]
    id = order(Y1,decreasing=T)[1:2]
    if (any(Y2[id]==0)) {return(NA)}
    p = chisq.test(Y1[id],p = Y2[id]/sum(Y2[id]))$p.value
    return(p)
}
Res = apply(Y,1,Fnc)

#You could even pre-calculate the indices to speed it up a little

id = apply(Y[,1:4],1,order,decreasing=T)[1:2,]
Y.Id = cbind(Y,t(id))

Fnc2 <-  function(Y) {
    Y1 = Y[1:4]
    Y2 = Y[5:8]
    id = Y[9:10]
    if (any(Y2[id]==0)) {return(NA)}
    p = chisq.test(Y1[id],p = Y2[id]/sum(Y2[id]))$p.value
    return(p)
}
Res2 = apply(Y.Id,1,Fnc2)

> identical(Res,Res2)
TRUE

Hope this helps,

Michael

On Thu, Aug 18, 2011 at 4:16 AM, Petr PIKAL <petr.pi...@precheza.cz> wrote:

> Hi
>
> r-help-boun...@r-project.org napsal dne 17.08.2011 21:07:43:
>
> >
> > Dear Michael,
> >
> > Thanks a lot for your reply and for your help.I was struggling so much
> but
> > your suggestion showed me a path to the solution of my problem.I have
> > tried your code on my data frame step wise and it looks fine to me.But
> > when i tried chi square test-
> >
> > res=chisq.test(y1[id],p=y2[id],rescale.p=T)
> >
> >         Chi-squared test for given probabilities
> >
> > data:  y1[id]
> > X-squared = NaN, df = 19997, p-value = NA
> >
> > Warning message:
> > In chisq.test(y1[id], p = y2[id], rescale.p = T) :
> >   Chi-squared approximation may be incorrect
>
> Check what Y1[id] is.
>
> Split Yn to lists
> l1<-split(Y1[id], rep(1:6, each=2))
> l2<-split(Y2[id], rep(1:6, each=2))
>
> do mapply on those list. But the result is rather silly as Michael pointed
> out.
>
> mapply(chisq.test, l1, l2, SIMPLIFY=F)
>
> or to get only p values
>
> lapply(mapply(chisq.test, l1, l2, SIMPLIFY=F),"[", 3)
>
> Regards
> Petr
>
> >
> > It is not giving p value.Then i checked observed and expected values,it
> is
> > taking all numbers under consideration.but as i mentioned earlier i want
> p
> > value for each row and therefore degree of freedom will be 1. example-
> >
> > I have a data frame with 8 columns-
> >       V1   V2       V3       V4      W1   W2        W3       W4
> > 1     0    84       22       10       0      84          0          0
> > 2    35    84        0        0     22      84          0          0
> > 3     0     0          0      48       0       0            0         48
> > 4     0    48        0        0       0      48           0          0
> > 5     0    84        0        0       0      84           0          0
> > 6     0     0        0       48       0       0            0         48
> >
> > example for first row is-
> >
> > first two largest values are 84(in V2) and 22 (in V3).so these are
> > considered as observed values.Now if the largest values are in V2 and
> > V3,we have to pick expected values from W2 and W3 which are 84 and 0.I
> > know for chi square test values should not be 0 but we will ignore the
> warning.
> >
> > now it should generate p value for next row taking 35 and 84 (v1 and v2)
>
> > as observed and 22 and 84 (w1 and w2) as expected.so here it will do chi
>
> > square test for all 6 rows and will generate 6 p values.My data frame
> has
> > lot of rows(approx. 9999).
> >
> > Can you please help me with this.
> >
> >
> >
> > Thanking you,
> > Warm Regards
> > Vikas Bansal
> > Msc Bioinformatics
> > Kings College London
> > ________________________________________
> > From: R. Michael Weylandt [michael.weyla...@gmail.com]
> > Sent: Wednesday, August 17, 2011 7:11 PM
> > To: Bansal, Vikas
> > Cc: r-help@r-project.org
> > Subject: Re: [R] Chi square test on data frame
> >
> > I think everything below is right, but it's all a little helter-skelter
> so
> > take it with a grain of salt:
> >
> > First things first, make your data with dput() for the list.
> >
> > Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0,
> > 0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48,
> > 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L
> > ), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), c("V1",
> > "V2", "V3", "V4", "W1", "W2", "W3", "W4")))
> >
> > Now,
> >
> > Y1 = Y[,1:4]
> > Y2 = Y[,-(1:4)]
> >
> > id = apply(Y1,1,order,decreasing=T)[1:2,]
> > # This has the columns you want in each row, but it's not directly
> > appropriate for subsetting
> > # Specifically, the problem is that the row information is implicit in
> > where the col index is in id
> > # We directly extract and force into a 2-col vector that gives rows and
> > columns for each data point
> > id = cbind(as.vector(col(id)),as.vector(id))
> >
> > Now you can take
> >
> > Y1[id] as the observed values and Y2[id] as the expected.
> >
> > But, to be honest, it sounds like you have more problems in using a
> chi-sq
> > test than anything else. Beyond all the zeros, you should note that you
> > always have #obs >= #expected because Y1>= Y2. I'll leave that up to you
> though.
> >
> > Hope this helps and please make sure you can take my code apart piece by
>
> > piece to understand it: there's some odd data manipulation that takes
> > advantage of R's way of coercing matrices to vectors and if your actual
> > data isn't like the provided example, you may have to modify.
> >
> > Michael Weylandt
> >
> > On Wed, Aug 17, 2011 at 10:26 AM, Bansal, Vikas <vikas.ban...@kcl.ac.uk<
> > mailto:vikas.ban...@kcl.ac.uk>> wrote:
> > Is there anyone who can help me with chi square test on data frame.I am
> > struggling from last 2 days.I will be very  thankful to you.
> >
> > Dear all,
> >
> > I have been working on this problem from so many hours but did not find
> > any solution.
> > I have a data frame with 8 columns-
> >       V1   V2       V3       V4      W1   W2        W3       W4
> > 1     0    84       22       10       0      84          0          0
> > 2    35    84        0        0     22      84          0          0
> > 3     0     0          0      48       0       0            0         48
> > 4     0    48        0        0       0      48           0          0
> > 5     0    84        0        0       0      84           0          0
> > 6     0     0        0       48       0       0            0         48
> >
> > from first four columns, for each row I have to take two largest values.
>
> > and these two values will be considered as observed values.And from last
>
> > four column we will get the expected values.So i have to perform chi
> > square test for each row to get p values.
> >
> > example for first row is-
> >
> > first two largest values are 84(in V2) and 22 (in V3).so these are
> > considered as observed values.Now if the largest values are in V2 and
> > V3,we have to pick expected values from W2 and W3 which are 84 and 0.I
> > know for chi square test values should not be 0 but we will ignore the
> warning.
> > Now as we have observed value as well as expected we have to perform chi
>
> > square test to get p values for each row in a new column.
> >
> >
> > So far I was working as returning the index for two largest value with-
> > sort.int<http://sort.int>(df,index.return=TRUE)$ix[c(4,3)]
> >  but it does not accept data frame.
> >
> > Can you please give some idea how to do this,because it is very tricky
> and
> > after studying a lot, I am not able to perform.Please help.
> >
> >
> >
> > Thanking you,
> > Warm Regards
> > Vikas Bansal
> > Msc Bioinformatics
> > Kings College London
> > ______________________________________________
> > R-help@r-project.org<mailto:R-help@r-project.org> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help@r-project.org<mailto:R-help@r-project.org> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi square test on data frame

Reply via email to