Hi All, I've come up with a solution for this problem that relies on a for loop, and I was wondering if anybody had any insight into a more elegant method:
I have two data frames, each has a column for categorical data and a column for date. What I'd like to do, ideally, is calculate the number of days between all pairs of dates in data frame 1 and data frame 2 (*but only for members of the same category*). The number of members of each category varies between the two data frames. For example: > d <- seq(as.Date("2000-02-12"), as.Date("2009-08-18"), by="weeks") > df1 <- data.frame('A'=sample(1:200,10), > 'date'=d[sample(1:length(d),10)],'category'=sample(1:4,10,replace=TRUE)) > df2 <- data.frame('A'=sample(1:200,10), > 'date'=d[sample(1:length(d),10)],'category'=sample(1:4,10,replace=TRUE)) > df1 A date category 1 93 2004-02-28 3 2 105 2001-03-17 3 3 189 2009-07-04 2 4 130 2003-07-05 2 5 160 2005-09-24 2 6 32 2004-11-06 2 7 117 2007-03-17 1 8 161 2003-07-19 4 9 153 2001-09-15 3 10 173 2005-08-27 1 > df2 A date category 1 102 2006-08-19 3 2 68 2004-11-27 2 3 137 2003-01-11 1 4 39 2002-12-28 2 5 127 2004-03-06 4 6 125 2002-02-23 2 7 150 2002-05-18 4 8 19 2003-02-22 1 9 80 2000-08-05 1 10 94 2003-12-27 1 Within a loop, I'd do the following (i is my counter; for the example, I set it to 1): > i<-1 # Create the data frames: > yeari_1 <- df1[which(df1['category']==i),]; yeari_2 <- > df2[which(df2['category']==i),] # Select only the data from category i > yeari_1 A date category 7 117 2007-03-17 1 10 173 2005-08-27 1 > yeari_2 A date category 3 137 2003-01-11 1 8 19 2003-02-22 1 9 80 2000-08-05 1 10 94 2003-12-27 1 # Convert dates to integers year1_i[[2]] <- as.integer(as.Date(yeari_1[[2]])); yeari_2[[2]] <- as.integer(as.Date(yeari_2[[2]])); > yeari_1 A date category 7 117 13589 1 10 173 13022 1 > yeari_2 A date category 3 137 12063 1 8 19 12105 1 9 80 11174 1 10 94 12413 1 # Get differences of all pairs: > result <- outer(yeari_1[[2]],yeari_2[[2]],'-') > result [,1] [,2] [,3] [,4] [1,] 1526 1484 2415 1176 [2,] 959 917 1848 609 # Now, merge the results with the results from all the earlier iterations for previous values of i, increment i to the next value, and repeat. ---- Ideally, I could accomplish this in some sort of vectorized manner, although the Force is not yet strong with me. Any ideas would be appreciated! Regards, Jonathan [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.