> -----Original Message----- > From: Bert Gunter [mailto:gunter.ber...@gene.com] > Sent: Monday, January 18, 2010 12:32 PM > To: William Dunlap; 'rusers.sh'; r-help@r-project.org > Subject: RE: [R] problem of data manipulation > > Absolutely... so long as you assume the dates are in order -- > or at least > that the earliest date of a group appears first. > > -- Bert >
Yes, I forgot to mention that requirement. When there are a lot of small groups run-based methods (sort then deal with a run at a time) can save a lot of time. They may also make the intent of the code more clear, but not everyone sees it that way. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On > Behalf Of William Dunlap > Sent: Monday, January 18, 2010 12:15 PM > To: Bert Gunter; rusers.sh; r-help@r-project.org > Subject: Re: [R] problem of data manipulation > > > -----Original Message----- > > From: r-help-boun...@r-project.org > > [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter > > Sent: Monday, January 18, 2010 11:54 AM > > To: 'rusers.sh'; r-help@r-project.org > > Subject: Re: [R] problem of data manipulation > > > > One way to do it: > > > > 1. Convert your date column to the Date class using the > > as.Date() function. > > This allows you to do the necessary arithmetic on the dates below. > > dt <- as.Date(a[,4],"%d/%m/%Y") > > > > 2. Create a factor out of your first three columns whose > > levels are in the > > same order as the unique rows. Something likes the following > > should do it: > > fac <- do.call(paste,a[,-4]) > > fac <- factor(fac, levels=unique(fac)) > > > > This allows you to choose the groups of rows whose dates you > > wish to compare > > and maintain their correct order in the data frame > > > > 3. Then use tapply: > > a[unlist(tapply(dt,fac,function(x)x-min(x) < 7)),] > > You can do this without unpacking and repacking > the data.frame (with tapply) based on the following > sort of calculation: > > > isFirstInRun <- function(x)c(TRUE, x[-1] != x[-length(x)]) > > f <- with(a, isFirstInRun(var1) | isFirstInRun(var2) | > isFirstInRun(var3)) > > firstRowInRun <- which(f) > > runNumber <- cumsum(f) > > dt <- as.Date(a$var4, "%d/%m/%Y") > > DaysSinceStartOfRun <- dt - dt[firstRowInRun[runNumber]] > > DaysSinceStartOfRun > Time differences in days > [1] 0 0 3 0 4 12 > > a[ DaysSinceStartOfRun < 7, ] > var1 var2 var3 var4 > 1 s 1 2 01/01/1999 > 2 c 1 2 10/02/2000 > 3 c 1 2 13/02/2000 > 4 n 2 1 11/02/2000 > 5 n 2 1 15/02/2000 > > Is that what you wanted? > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > > (unlist is needed to remove the list structure and > > concatenate the logical > > indices to obtain the subscripting vector). > > > > Bert Gunter > > Genentech Nonclinical Statistics > > > > -----Original Message----- > > From: r-help-boun...@r-project.org > > [mailto:r-help-boun...@r-project.org] On > > Behalf Of rusers.sh > > Sent: Monday, January 18, 2010 10:40 AM > > To: r-help@r-project.org > > Subject: [R] problem of data manipulation > > > > Hello, > > See my problem below. > > a<-data.frame(c("s","c","c","n","n","n"),c(rep(1,3),rep(2,3)), > > c(rep(2,3),rep > > (1,3)),c("01/01/1999","10/02/2000","13/02/2000","11/02/2000"," > > 15/02/2000","2 > > 3/02/2000")) > > colnames(a)<-c("var1","var2","var3","var4") > > > a > > var1 var2 var3 var4 > > 1 s 1 2 01/01/1999 > > 2 c 1 2 10/02/2000 > > 3 c 1 2 13/02/2000 > > 4 n 2 1 11/02/2000 > > 5 n 2 1 15/02/2000 > > 6 n 2 1 23/02/2000 > > > > I want to select the observations whose difference of > > "var4" is less than > > 7 for the cases with the same values of var1,var2 andvar3. > > The obervations have the same var1, var2 and var3 are, > > part1 (obs2 and > > obs3) and part2 (obs4,obs5, and obs6). > > For obs2 and obs3, their date difference is less than 7, so > > we donot need > > to delete any of them. > > For obs4,obs5, and obs6,we can see that obs6 should be > > deleted becuase its > > date is over 7 dyas longer than obs4. > > So the final dataset should obs1,obs2,obs3,obs4, and obs5. > > I have a lot of observations in my dataset, so i hope to do this > > automatically. Any ideas on this? > > Thanks. > > -- > > ----------------- > > Jane Chang > > Queen's > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.