Here are two approaches to try: > # test data > d1 <- data.frame(x = Sys.Date() + 1:3) > d2 <- data.frame(x = Sys.Date() - 1:3)
> # 1. you might not have enough memory for this but its short > table(outer(1:3, -(1:3), "-")) 2 3 4 5 6 1 2 3 2 1 > # 2. this one performs all the operations outside of R getting > # result back in so it won't need as much memory > > library(sqldf) > sqldf("select d1.x - d2.x, count(*) from d1, d2 group by d1.x - d2.x") d1.x - d2.x count(*) 1 2 1 2 3 2 3 4 3 4 5 2 5 6 1 On Mon, Feb 15, 2010 at 9:17 PM, Jonathan <jonsle...@gmail.com> wrote: > Let me fix a couple of typos in that email: > > Hi All: > > Let's say I have two dataframes (Condition1 and Condition2); each > being on the order of 12,000 and 16,000 rows; 1 column. The entries > contain dates. > > I'd like to calculate, for each possible pair of dates (that is: > Condition1[1:12,000] and Condition2[1:16,000], the number of days > difference between the dates in the pair. The result should be a > matrix 12,000 by 16,000, which I'll call M. The purpose of building > such a matrix M is to create a histogram of all the values contained > within it. > > Ex): > Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000)) > Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000)) > > First, my instinct is to try and vectorize the operation. I tried > this by expanding each vector into a matrix of repeated vectors (I'd > then just subtract the two resultant matrices to get matrix M). I got > the following error: > >> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), >> byrow=TRUE, ncol=nrow(Condition1)) > Error: cannot allocate vector of size 732.4 Mb >> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), >> byrow=FALSE, nrow=nrow(Condition2)) > Error: cannot allocate vector of size 732.4 Mb > > Since it seems these matrices are too large, I'm wondering whether > there's a better way to call a hist command without actually building > the said matrix.. > > I'd greatly appreciate any ideas! > > Best, > Jonathan > > On Mon, Feb 15, 2010 at 8:19 PM, Jonathan <jonsle...@gmail.com> wrote: >> Hi All: >> >> Let's say I have two dataframes (Condition1 and Condition2); each >> being on the order of 12,000 and 16,000 rows; 1 column. The entries >> contain dates. >> >> I'd like to calculate, for each possible pair of dates (that is: >> Condition1[1:10,000] and Condition2[1:10,000], the number of days >> difference between the dates in the pair. The result should be a >> matrix 12,000 by 16,000. Really, what I need is a histogram of all >> the values in this matrix. >> >> Ex): >> Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000)) >> Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000)) >> >> First, my instinct is to try and vectorize the operation. I tried >> this by expanding each vector into a matrix of repeated vectors (I'd >> then just subtract the two). I got the following error: >> >>> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), >>> byrow=TRUE, ncol=nrow(Condition1)) >> Error: cannot allocate vector of size 732.4 Mb >>> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), >>> byrow=FALSE, nrow=nrow(Condition2)) >> Error: cannot allocate vector of size 732.4 Mb >> >> Since it seems these matrices are too large, I'm wondering whether >> there's a better way to call a hist command without actually building >> the said matrix.. >> >> I'd greatly appreciate any ideas! >> >> Best, >> Jonathan >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.