Thank you so much. I got it. 2010/1/18 William Dunlap <wdun...@tibco.com>
> > -----Original Message----- > > From: Bert Gunter [mailto:gunter.ber...@gene.com] > > Sent: Monday, January 18, 2010 12:32 PM > > To: William Dunlap; 'rusers.sh'; r-help@r-project.org > > Subject: RE: [R] problem of data manipulation > > > > Absolutely... so long as you assume the dates are in order -- > > or at least > > that the earliest date of a group appears first. > > > > -- Bert > > > > Yes, I forgot to mention that requirement. When > there are a lot of small groups run-based methods > (sort then deal with a run at a time) can save a > lot of time. They may also make the intent of > the code more clear, but not everyone sees it that way. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > -----Original Message----- > > From: r-help-boun...@r-project.org > > [mailto:r-help-boun...@r-project.org] On > > Behalf Of William Dunlap > > Sent: Monday, January 18, 2010 12:15 PM > > To: Bert Gunter; rusers.sh; r-help@r-project.org > > Subject: Re: [R] problem of data manipulation > > > > > -----Original Message----- > > > From: r-help-boun...@r-project.org > > > [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter > > > Sent: Monday, January 18, 2010 11:54 AM > > > To: 'rusers.sh'; r-help@r-project.org > > > Subject: Re: [R] problem of data manipulation > > > > > > One way to do it: > > > > > > 1. Convert your date column to the Date class using the > > > as.Date() function. > > > This allows you to do the necessary arithmetic on the dates below. > > > dt <- as.Date(a[,4],"%d/%m/%Y") > > > > > > 2. Create a factor out of your first three columns whose > > > levels are in the > > > same order as the unique rows. Something likes the following > > > should do it: > > > fac <- do.call(paste,a[,-4]) > > > fac <- factor(fac, levels=unique(fac)) > > > > > > This allows you to choose the groups of rows whose dates you > > > wish to compare > > > and maintain their correct order in the data frame > > > > > > 3. Then use tapply: > > > a[unlist(tapply(dt,fac,function(x)x-min(x) < 7)),] > > > > You can do this without unpacking and repacking > > the data.frame (with tapply) based on the following > > sort of calculation: > > > > > isFirstInRun <- function(x)c(TRUE, x[-1] != x[-length(x)]) > > > f <- with(a, isFirstInRun(var1) | isFirstInRun(var2) | > > isFirstInRun(var3)) > > > firstRowInRun <- which(f) > > > runNumber <- cumsum(f) > > > dt <- as.Date(a$var4, "%d/%m/%Y") > > > DaysSinceStartOfRun <- dt - dt[firstRowInRun[runNumber]] > > > DaysSinceStartOfRun > > Time differences in days > > [1] 0 0 3 0 4 12 > > > a[ DaysSinceStartOfRun < 7, ] > > var1 var2 var3 var4 > > 1 s 1 2 01/01/1999 > > 2 c 1 2 10/02/2000 > > 3 c 1 2 13/02/2000 > > 4 n 2 1 11/02/2000 > > 5 n 2 1 15/02/2000 > > > > Is that what you wanted? > > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > > > > > (unlist is needed to remove the list structure and > > > concatenate the logical > > > indices to obtain the subscripting vector). > > > > > > Bert Gunter > > > Genentech Nonclinical Statistics > > > > > > -----Original Message----- > > > From: r-help-boun...@r-project.org > > > [mailto:r-help-boun...@r-project.org] On > > > Behalf Of rusers.sh > > > Sent: Monday, January 18, 2010 10:40 AM > > > To: r-help@r-project.org > > > Subject: [R] problem of data manipulation > > > > > > Hello, > > > See my problem below. > > > a<-data.frame(c("s","c","c","n","n","n"),c(rep(1,3),rep(2,3)), > > > c(rep(2,3),rep > > > (1,3)),c("01/01/1999","10/02/2000","13/02/2000","11/02/2000"," > > > 15/02/2000","2 > > > 3/02/2000")) > > > colnames(a)<-c("var1","var2","var3","var4") > > > > a > > > var1 var2 var3 var4 > > > 1 s 1 2 01/01/1999 > > > 2 c 1 2 10/02/2000 > > > 3 c 1 2 13/02/2000 > > > 4 n 2 1 11/02/2000 > > > 5 n 2 1 15/02/2000 > > > 6 n 2 1 23/02/2000 > > > > > > I want to select the observations whose difference of > > > "var4" is less than > > > 7 for the cases with the same values of var1,var2 andvar3. > > > The obervations have the same var1, var2 and var3 are, > > > part1 (obs2 and > > > obs3) and part2 (obs4,obs5, and obs6). > > > For obs2 and obs3, their date difference is less than 7, so > > > we donot need > > > to delete any of them. > > > For obs4,obs5, and obs6,we can see that obs6 should be > > > deleted becuase its > > > date is over 7 dyas longer than obs4. > > > So the final dataset should obs1,obs2,obs3,obs4, and obs5. > > > I have a lot of observations in my dataset, so i hope to do this > > > automatically. Any ideas on this? > > > Thanks. > > > -- > > > ----------------- > > > Jane Chang > > > Queen's > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > -- ----------------- Jane Chang Queen's [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.