Re: [R] Chi square test on data frame

R. Michael Weylandt Wed, 17 Aug 2011 11:12:08 -0700

I think everything below is right, but it's all a little helter-skelter so
take it with a grain of salt:


First things first, make your data with dput() for the list.

Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0,
0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48,
84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L
), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), c("V1",
"V2", "V3", "V4", "W1", "W2", "W3", "W4")))

Now,

Y1 = Y[,1:4]
Y2 = Y[,-(1:4)]

id = apply(Y1,1,order,decreasing=T)[1:2,]
# This has the columns you want in each row, but it's not directly
appropriate for subsetting
# Specifically, the problem is that the row information is implicit in where
the col index is in id
# We directly extract and force into a 2-col vector that gives rows and
columns for each data point
id = cbind(as.vector(col(id)),as.vector(id))

Now you can take

Y1[id] as the observed values and Y2[id] as the expected.

But, to be honest, it sounds like you have more problems in using a chi-sq
test than anything else. Beyond all the zeros, you should note that you
always have #obs >= #expected because Y1>= Y2. I'll leave that up to you
though.

Hope this helps and please make sure you can take my code apart piece by
piece to understand it: there's some odd data manipulation that takes
advantage of R's way of coercing matrices to vectors and if your actual data
isn't like the provided example, you may have to modify.

Michael Weylandt

On Wed, Aug 17, 2011 at 10:26 AM, Bansal, Vikas <vikas.ban...@kcl.ac.uk>wrote:

> Is there anyone who can help me with chi square test on data frame.I am
> struggling from last 2 days.I will be very  thankful to you.
>
> Dear all,
>
> I have been working on this problem from so many hours but did not find any
> solution.
> I have a data frame with 8 columns-
>       V1   V2       V3       V4      W1   W2        W3       W4
> 1     0    84       22       10       0      84          0          0
> 2    35    84        0        0     22      84          0          0
> 3     0     0          0      48       0       0            0         48
> 4     0    48        0        0       0      48           0          0
> 5     0    84        0        0       0      84           0          0
> 6     0     0        0       48       0       0            0         48
>
> from first four columns, for each row I have to take two largest values.
> and these two values will be considered as observed values.And from last
> four column we will get the expected values.So i have to perform chi square
> test for each row to get p values.
>
> example for first row is-
>
> first two largest values are 84(in V2) and 22 (in V3).so these are
> considered as observed values.Now if the largest values are in V2 and V3,we
> have to pick expected values from W2 and W3 which are 84 and 0.I know for
> chi square test values should not be 0 but we will ignore the warning.
> Now as we have observed value as well as expected we have to perform chi
> square test to get p values for each row in a new column.
>
>
> So far I was working as returning the index for two largest value with-
> sort.int(df,index.return=TRUE)$ix[c(4,3)]
>  but it does not accept data frame.
>
> Can you please give some idea how to do this,because it is very tricky and
> after studying a lot, I am not able to perform.Please help.
>
>
>
> Thanking you,
> Warm Regards
> Vikas Bansal
> Msc Bioinformatics
> Kings College London
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi square test on data frame

Reply via email to