Hi All,
   I've come up with a solution for this problem that relies on a for loop,
and I was wondering if anybody had any insight into a more elegant method:

I have two data frames, each has a column for categorical data and a column
for date.  What I'd like to do, ideally, is calculate the number of days
between all pairs of dates in data frame 1 and data frame 2 (*but only for
members of the same category*).  The number of members of each category
varies between the two data frames.

For example:


> d <- seq(as.Date("2000-02-12"), as.Date("2009-08-18"), by="weeks")

> df1 <- data.frame('A'=sample(1:200,10), 
> 'date'=d[sample(1:length(d),10)],'category'=sample(1:4,10,replace=TRUE))

> df2 <- data.frame('A'=sample(1:200,10), 
> 'date'=d[sample(1:length(d),10)],'category'=sample(1:4,10,replace=TRUE))


> df1
     A       date category
1   93 2004-02-28        3
2  105 2001-03-17        3
3  189 2009-07-04        2
4  130 2003-07-05        2
5  160 2005-09-24        2
6   32 2004-11-06        2
7  117 2007-03-17        1
8  161 2003-07-19        4
9  153 2001-09-15        3
10 173 2005-08-27        1


> df2
     A       date category
1  102 2006-08-19        3
2   68 2004-11-27        2
3  137 2003-01-11        1
4   39 2002-12-28        2
5  127 2004-03-06        4
6  125 2002-02-23        2
7  150 2002-05-18        4
8   19 2003-02-22        1
9   80 2000-08-05        1
10  94 2003-12-27        1


Within a loop, I'd do the following (i is my counter; for the example,
I set it to 1):


> i<-1

# Create the data frames:

> yeari_1 <- df1[which(df1['category']==i),]; yeari_2 <- 
> df2[which(df2['category']==i),]

# Select only the data from category i

> yeari_1
     A       date category
7  117 2007-03-17        1
10 173 2005-08-27        1

> yeari_2
     A       date category
3  137 2003-01-11        1
8   19 2003-02-22        1
9   80 2000-08-05        1
10  94 2003-12-27        1

# Convert dates to integers

year1_i[[2]] <- as.integer(as.Date(yeari_1[[2]])); yeari_2[[2]] <-
as.integer(as.Date(yeari_2[[2]]));

> yeari_1
     A  date category
7  117 13589        1
10 173 13022        1
> yeari_2
     A  date category
3  137 12063        1
8   19 12105        1
9   80 11174        1
10  94 12413        1

# Get differences of all pairs:

> result <- outer(yeari_1[[2]],yeari_2[[2]],'-')
> result
     [,1] [,2] [,3] [,4]
[1,] 1526 1484 2415 1176
[2,]  959  917 1848  609

# Now, merge the results with the results from all the earlier
iterations for previous values of i, increment i to the next value,
and repeat.


----

Ideally, I could accomplish this in some sort of vectorized manner,
although the Force is not yet strong with me.  Any ideas would be
appreciated!


Regards,

Jonathan

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to