Absolutely... so long as you assume the dates are in order -- or at least that the earliest date of a group appears first.
-- Bert -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of William Dunlap Sent: Monday, January 18, 2010 12:15 PM To: Bert Gunter; rusers.sh; r-help@r-project.org Subject: Re: [R] problem of data manipulation > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter > Sent: Monday, January 18, 2010 11:54 AM > To: 'rusers.sh'; r-help@r-project.org > Subject: Re: [R] problem of data manipulation > > One way to do it: > > 1. Convert your date column to the Date class using the > as.Date() function. > This allows you to do the necessary arithmetic on the dates below. > dt <- as.Date(a[,4],"%d/%m/%Y") > > 2. Create a factor out of your first three columns whose > levels are in the > same order as the unique rows. Something likes the following > should do it: > fac <- do.call(paste,a[,-4]) > fac <- factor(fac, levels=unique(fac)) > > This allows you to choose the groups of rows whose dates you > wish to compare > and maintain their correct order in the data frame > > 3. Then use tapply: > a[unlist(tapply(dt,fac,function(x)x-min(x) < 7)),] You can do this without unpacking and repacking the data.frame (with tapply) based on the following sort of calculation: > isFirstInRun <- function(x)c(TRUE, x[-1] != x[-length(x)]) > f <- with(a, isFirstInRun(var1) | isFirstInRun(var2) | isFirstInRun(var3)) > firstRowInRun <- which(f) > runNumber <- cumsum(f) > dt <- as.Date(a$var4, "%d/%m/%Y") > DaysSinceStartOfRun <- dt - dt[firstRowInRun[runNumber]] > DaysSinceStartOfRun Time differences in days [1] 0 0 3 0 4 12 > a[ DaysSinceStartOfRun < 7, ] var1 var2 var3 var4 1 s 1 2 01/01/1999 2 c 1 2 10/02/2000 3 c 1 2 13/02/2000 4 n 2 1 11/02/2000 5 n 2 1 15/02/2000 Is that what you wanted? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > (unlist is needed to remove the list structure and > concatenate the logical > indices to obtain the subscripting vector). > > Bert Gunter > Genentech Nonclinical Statistics > > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On > Behalf Of rusers.sh > Sent: Monday, January 18, 2010 10:40 AM > To: r-help@r-project.org > Subject: [R] problem of data manipulation > > Hello, > See my problem below. > a<-data.frame(c("s","c","c","n","n","n"),c(rep(1,3),rep(2,3)), > c(rep(2,3),rep > (1,3)),c("01/01/1999","10/02/2000","13/02/2000","11/02/2000"," > 15/02/2000","2 > 3/02/2000")) > colnames(a)<-c("var1","var2","var3","var4") > > a > var1 var2 var3 var4 > 1 s 1 2 01/01/1999 > 2 c 1 2 10/02/2000 > 3 c 1 2 13/02/2000 > 4 n 2 1 11/02/2000 > 5 n 2 1 15/02/2000 > 6 n 2 1 23/02/2000 > > I want to select the observations whose difference of > "var4" is less than > 7 for the cases with the same values of var1,var2 andvar3. > The obervations have the same var1, var2 and var3 are, > part1 (obs2 and > obs3) and part2 (obs4,obs5, and obs6). > For obs2 and obs3, their date difference is less than 7, so > we donot need > to delete any of them. > For obs4,obs5, and obs6,we can see that obs6 should be > deleted becuase its > date is over 7 dyas longer than obs4. > So the final dataset should obs1,obs2,obs3,obs4, and obs5. > I have a lot of observations in my dataset, so i hope to do this > automatically. Any ideas on this? > Thanks. > -- > ----------------- > Jane Chang > Queen's > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.