Re: [R] problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome

Maya Joshi Sat, 03 Sep 2011 06:19:58 -0700

Dear R experts:

Thank you Dennis and David ...


As David indicated sorry of language and I have tried to explain what I
intend to do... I would this with Dennis's solution code:


ped <- rep(1:3, c(4, 3, 3))
> y <- rnorm(10, 8, 2)
> # This replaces all of your sample() statements, and is equivalent:
> smat <- matrix(sample(1:3, 120, replace = TRUE), ncol = 12)
> colnames(smat) <- c('M1a', 'M1b', 'M1aP1', 'M1bP2',
>                    'M2a', 'M2b', 'M2aP1', 'M2bP2',
>                    'M3a', 'M3b', 'M3aP1', 'M3bP2')
> mydf <- as.data.frame(cbind(ped, y, smat))
>


>
> mmat <- matrix
(c("M1a","M2a","M3a","M1b","M2b","M3b","M1aP1","M2aP1","M3aP1",
"M1bP2","M2bP2","M3bP2"), ncol = 4)
   [,1]  [,2]  [,3]    [,4]

[1,] "M1a" "M1b" "M1aP1" "M1bP2"
[2,] "M2a" "M2b" "M2aP1" "M2bP2"
[3,] "M3a" "M3b" "M3aP1" "M3bP2"

I want to compare [,1]  and [,3]  names of mydf  (mydf[x[1]] == mydf[x[3]])
. for all three rows in the nmat. nmat is guiding me which variable I want
to pick while working on mydf.  In my real dataset I have 1000 such set of
variables.

# first function
myfun <- function(x) {
x<- as.vector(x)
ot1 <- ifelse(mydf[x[1]] == mydf[x[3]], 1, -1)
ot2 <- ifelse(mydf[x[2]] == mydf[x[4]], 1, -1)
qt <- ot1 + ot2
return(qt)
}
qt <- apply(mmat, 1, myfun)

Solution of this will create a matrix with number of set of variables by
number of rows in the mydf
  [,1] [,2] [,3]
 [1,]    0   -2    0
 [2,]   -2    0   -2
 [3,]    0   -2    0
 [4,]    0    0    2
 [5,]    0   -2   -2
 [6,]   -2    0   -2
 [7,]   -2   -2    0
 [8,]   -2    0    0
 [9,]   -2    0    2
[10,]    0    0    0

ydv <- c((y - mean(y))^2)  # calculates mean of y and deviations from it for
each y values
[1]  9.5012525  0.2578341  1.6676271  6.3102202 12.8701830  9.5509480
 [7]  0.8661107  3.1828185  0.9215140  1.0909813

qtd <- data.frame(ped, ydv, qt) # new data.frame with above function's
output with ped variable
  ped        ydv X1 X2 X3
1    1  9.5012525  0 -2  0
2    1  0.2578341 -2  0 -2
3    1  1.6676271  0 -2  0
4    1  6.3102202  0  0  2
5    2 12.8701830  0 -2 -2
6    2  9.5509480 -2  0 -2
7    2  0.8661107 -2 -2  0
8    3  3.1828185 -2  0  0
9    3  0.9215140 -2  0  2
10   3  1.0909813  0  0  0

Now I want to calculate Rt for each X1, X2, X3 (in real data world I will
have 1000 of them). The expected result of the following function should
look like 3 x 3 matrix. This is just example, I do have Ped around 200 and
X1 is around 1000.
# Rt values
 Ped      X1    X2        X3
1
2
3

# second function
myfun2 <- function(dataframe) {
vydv <- sum(ydv)*0.25
sumD <- sum(ydv * qt)
Rt <- vydv / sumD
return(Rt)
}

# using plyr
require(plyr)
dfsumd1 <- ddply(mydf,.(mydf$ped),myfun2)

 dfsumd1
  mydf$ped         V1
1        1 -0.1047935
2        2 -0.1047935
3        3 -0.1047935

This is not what I want. I want ped wise Rt values for each of X variables
in above qtd matrix.
# Rt values
 Ped      X1    X2        X3
1
2
3

Then in I can sum Ped$X1, Ped$X2, Ped$X3. The idea is to calculated separate
Rt values for each variable group by Ped variables separately. Then add the
values.

Thank you so much for your time. Hope I had made it clear now.

Maya

>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome

Reply via email to