Hi, Is there an alternative to "z <- read.zoo(DF, split = 2, index = 3, FUN = identity)" and "r <- rollapply(z, 3, sum.na, align = "right", partial = TRUE)"? I am trying to use the following script in which the split data (B) contains about 300000 unique cases and obviously I am getting an allocation error. Thanks!
# devel version of zoo install.packages("zoo", repos = "http://r-forge.r-project.org") DF = data.frame(read.table(textConnection(" A B C D E F 1 a 1995 0 4 1 2 a 1997 1 1 3 3 b 1995 3 7 0 4 b 1996 1 2 3 5 b 1997 1 2 3 6 b 1998 6 0 0 7 b 1999 3 7 0 8 c 1997 1 2 3 9 c 1998 1 2 3 10 c 1999 6 0 0 11 d 1999 3 7 0 12 e 1995 1 2 3 13 e 1998 1 2 3 14 e 1999 6 0 0"),head=TRUE,stringsAsFactors=FALSE)) library(zoo) z <- read.zoo(DF, split = 2, index = 3, FUN = identity) sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA r <- rollapply(z, 3, sum.na, align = "right", partial = TRUE) newDF <- lapply(1:nrow(r), function(i) prop.table(na.omit(matrix(r[i,], nc = 4, byrow = TRUE, dimnames = list(unique(DF$B), names(DF)[-2:-3]))[, -1]), 1)) names(newDF) <- time(z) lapply(newDF, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2)))) Gabor Grothendieck wrote: > > On Sat, Apr 9, 2011 at 5:14 AM, mathijsdevaan > <mathijsdev...@gmail.com> wrote: >> Hi, >> >> I need to perform calculations on subsets of a data frame: >> >> DF = data.frame(read.table(textConnection(" A B C D E F >> 1 a 1995 0 4 1 >> 2 a 1997 1 1 3 >> 3 b 1995 3 7 0 >> 4 b 1996 1 2 3 >> 5 b 1997 1 2 3 >> 6 b 1998 6 0 0 >> 7 b 1999 3 7 0 >> 8 c 1997 1 2 3 >> 9 c 1998 1 2 3 >> 10 c 1999 6 0 0 >> 11 d 1999 3 7 0 >> 12 e 1995 1 2 3 >> 13 e 1998 1 2 3 >> 14 e 1999 6 0 0"),head=TRUE,stringsAsFactors=FALSE)) >> >> I'd like to create new dataframes for each unique year in which for each >> value of A, the values of D, E and F are summed over the last 3 years >> (e.g. >> 1998 = 1998, 1997, 1996): >> Question 1: How do I go from DF to newDFyear? >> >> Examples: >> >> newDF1995 >> B D E F >> a 0 4 1 >> b 3 7 0 >> e 1 2 3 >> >> newDF1998 >> B D E F >> a 1 1 3 >> b 8 4 6 >> c 2 4 6 >> e 1 2 3 >> >> Then, for each new DF I need to generate a square matrix after doing the >> following: >> >> newDF1998$G<-newDF1998$D + newDF1998$E + newDF1998$F >> newDF1998$D<-newDF1998$D/newDF1998$G >> newDF1998$E<-newDF1998$E/newDF1998$G >> newDF1998$F<-newDF1998$F/newDF1998$G >> newDF1998<-NewDF1998[,c(-5)] >> >> newDF1998 >> B D E F >> a 0.2 0.2 0.6 >> b 0.4 0.2 0.3 >> c 0.2 0.3 0.5 >> e 0.2 0.3 0.5 >> >> Question 2: How do I go from newDF1998 to a matrix >> >> a b c e >> a >> b >> c >> e >> >> in which Cell ab = (0.2*0.4 + 0.2*0.2 + 0.6*0.3)/((0.2*0.2 + 0.2*0.2 + >> 0.6*0.6)^0.5) * ((0.4*0.4 + 0.2*0.2 + 0.3*0.3)^0.5) = 0.84 > > First we use read.zoo to reform DF into a multivariate time series and > use rollapply (where we have used the devel version of zoo since it > supports the partial= argument on rollapply). We then reform each > resulting row into a matrix converting each row of each matrix to > proportions. Finally we form the desired scaled cross product. > > # devel version of zoo > install.packages("zoo", repos = "http://r-forge.r-project.org") > library(zoo) > > z <- read.zoo(DF, split = 2, index = 3, FUN = identity) > > sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA > r <- rollapply(z, 3, sum.na, align = "right", partial = TRUE) > > newDF <- lapply(1:nrow(r), function(i) > prop.table(na.omit(matrix(r[i,], nc = 4, byrow = TRUE, > dimnames = list(unique(DF$B), names(DF)[-2:-3]))[, -1]), 1)) > names(newDF) <- time(z) > > lapply(mats, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2)))) > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- View this message in context: http://r.789695.n4.nabble.com/Yearly-aggregates-and-matrices-tp3438140p3478997.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.