Thank you so much. I got it.

2010/1/18 William Dunlap <wdun...@tibco.com>

> > -----Original Message-----
> > From: Bert Gunter [mailto:gunter.ber...@gene.com]
> > Sent: Monday, January 18, 2010 12:32 PM
> > To: William Dunlap; 'rusers.sh'; r-help@r-project.org
> > Subject: RE: [R] problem of data manipulation
> >
> > Absolutely... so long as you assume the dates are in order --
> > or at least
> > that the earliest date of a group appears first.
> >
> > -- Bert
> >
>
> Yes, I forgot to mention that requirement.  When
> there are a lot of small groups run-based methods
> (sort then deal with a run at a time) can save a
> lot of time.  They may also make the intent of
> the code more clear, but not everyone sees it that way.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> > -----Original Message-----
> > From: r-help-boun...@r-project.org
> > [mailto:r-help-boun...@r-project.org] On
> > Behalf Of William Dunlap
> > Sent: Monday, January 18, 2010 12:15 PM
> > To: Bert Gunter; rusers.sh; r-help@r-project.org
> > Subject: Re: [R] problem of data manipulation
> >
> > > -----Original Message-----
> > > From: r-help-boun...@r-project.org
> > > [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter
> > > Sent: Monday, January 18, 2010 11:54 AM
> > > To: 'rusers.sh'; r-help@r-project.org
> > > Subject: Re: [R] problem of data manipulation
> > >
> > > One way to do it:
> > >
> > > 1. Convert your date column to the Date class using the
> > > as.Date() function.
> > > This allows you to do the necessary arithmetic on the dates below.
> > > dt <- as.Date(a[,4],"%d/%m/%Y")
> > >
> > > 2. Create a factor out of your first three columns whose
> > > levels are in the
> > > same order as the unique rows. Something likes the following
> > > should do it:
> > > fac <- do.call(paste,a[,-4])
> > > fac <- factor(fac, levels=unique(fac))
> > >
> > > This allows you to choose the groups of rows whose dates you
> > > wish to compare
> > > and maintain their correct order in the data frame
> > >
> > > 3. Then use tapply:
> > > a[unlist(tapply(dt,fac,function(x)x-min(x) < 7)),]
> >
> > You can do this without unpacking and repacking
> > the data.frame (with tapply) based on the following
> > sort of calculation:
> >
> >   > isFirstInRun <- function(x)c(TRUE, x[-1] != x[-length(x)])
> >   > f <- with(a, isFirstInRun(var1) | isFirstInRun(var2) |
> > isFirstInRun(var3))
> >   > firstRowInRun <- which(f)
> >   > runNumber <- cumsum(f)
> >   > dt <- as.Date(a$var4, "%d/%m/%Y")
> >   > DaysSinceStartOfRun <- dt - dt[firstRowInRun[runNumber]]
> >   > DaysSinceStartOfRun
> >   Time differences in days
> >   [1]  0  0  3  0  4 12
> >   > a[ DaysSinceStartOfRun < 7, ]
> >     var1 var2 var3       var4
> >   1    s    1    2 01/01/1999
> >   2    c    1    2 10/02/2000
> >   3    c    1    2 13/02/2000
> >   4    n    2    1 11/02/2000
> >   5    n    2    1 15/02/2000
> >
> > Is that what you wanted?
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> > >
> > > (unlist is needed to remove the list structure and
> > > concatenate the logical
> > > indices to obtain the subscripting vector).
> > >
> > > Bert Gunter
> > > Genentech Nonclinical Statistics
> > >
> > > -----Original Message-----
> > > From: r-help-boun...@r-project.org
> > > [mailto:r-help-boun...@r-project.org] On
> > > Behalf Of rusers.sh
> > > Sent: Monday, January 18, 2010 10:40 AM
> > > To: r-help@r-project.org
> > > Subject: [R] problem of data manipulation
> > >
> > > Hello,
> > >   See my problem below.
> > > a<-data.frame(c("s","c","c","n","n","n"),c(rep(1,3),rep(2,3)),
> > > c(rep(2,3),rep
> > > (1,3)),c("01/01/1999","10/02/2000","13/02/2000","11/02/2000","
> > > 15/02/2000","2
> > > 3/02/2000"))
> > > colnames(a)<-c("var1","var2","var3","var4")
> > > > a
> > >   var1 var2 var3       var4
> > > 1    s    1    2    01/01/1999
> > > 2    c    1    2    10/02/2000
> > > 3    c    1    2    13/02/2000
> > > 4    n    2    1    11/02/2000
> > > 5    n    2    1    15/02/2000
> > > 6    n    2    1    23/02/2000
> > >
> > >   I want to select the observations whose difference of
> > > "var4" is less than
> > > 7 for the cases with the same values of var1,var2 andvar3.
> > >   The obervations have the same var1, var2 and var3 are,
> > > part1 (obs2 and
> > > obs3) and part2 (obs4,obs5, and obs6).
> > >   For obs2 and obs3, their date difference is less than 7, so
> > > we donot need
> > > to delete any of them.
> > >   For obs4,obs5, and obs6,we can see that obs6 should be
> > > deleted becuase its
> > > date is over 7 dyas longer than obs4.
> > >   So the final dataset should obs1,obs2,obs3,obs4, and obs5.
> > >   I have a lot of observations in my dataset, so i hope to do this
> > > automatically.  Any ideas on this?
> > >   Thanks.
> > > --
> > > -----------------
> > > Jane Chang
> > > Queen's
> > >
> > >     [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > > ______________________________________________
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>



-- 
-----------------
Jane Chang
Queen's

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to