Try this: require(zoo) lvd <- tapply(df$visit_date, df$unique_id, max) index <- tapply(df$visit_date, df$unique_id) df$last_visit_date <- as.Date(lvd[index])
Jean Kathleen Rollet wrote on 08/24/2011 04:15:45 PM: > > Dear R users, > > I am encoutering the following problem: I have a dataset with a > 'unique_id' and different 'visit_date' (formatted as.Date, "%d/%m/% > Y") per unique_id. I would like to create a new variable with the > most recent date of visit per unique_id as shown below. > > unique_id visit_date last_visit_date > 1 01/06/2010 01/06/2011 > 1 01/01/2011 01/06/2011 > 1 01/06/2011 01/06/2011 > 2 01/01/2009 01/07/2011 > 2 01/06/2009 01/07/2011 > 2 01/06/2010 01/07/2011 > 2 01/01/2011 01/07/2011 > 2 01/07/2011 01/07/2011 > 3 01/01/2008 01/01/2008 > 4 01/01/2009 01/01/2010 > 4 01/01/2010 01/01/2010 > > I know the coding to easily do this in Stata, SAS, and Excel but I > cannot find how to do it in R. I try multiple function such as > tapply( ), ave( ), ddply ( ), and transform ( ) after looking into > previous postings. The codes are running but only NA values are > generated or I get error messages that the replacement has less row > than the data has (there are about 1000 unique_id and over 4000 rows > in my dataset presently). > I would greatly appreciate if someone could help me. > > Thank you! > > Kathleen R. > Epidemiologist > Montreal, QC, Canada > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.