Hi: Since you tried several functions (reasonably so IMO), here is how they would work in this problem, in addition to the solutions already supplied.
Some data massaging before starting, taking your data as input, saved into an object named visits: visits <- structure(list(unique_id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 4L, 4L), visit_date = structure(c(14615, 14975, 14980, 14245, 14250, 14615, 14975, 14981, 13879, 14245, 14610), class = "Date")), .Names = c("unique_id", "visit_date"), row.names = c(NA, -11L), class = "data.frame") # plyr package: ddply(visits, .(unique_id), transform, last_visit_date = max(visit_date)) # Faster version, using the more recent function mutate(): mutate(visits, .(unique_id), last_visit_date = max(visit_date)) # data.table: library(data.table) # Create the data table from a data frame, using unique_id as a key: visDT <- data.table(visits, key = 'unique_id') # list() is used to output multiple variables: visDT[, list(visit_date, last_visit_date = max(visit_date)), by = 'unique_id'] # doBy package: library(doBy) transformBy(~ unique_id, data = visits, last_visit_date = max(visit_date)) # base package, using transform(): transform(visits, last_visit_date = ave(visit_date, unique_id, FUN = max)) # ...which you could have gotten from the original data as follows, including # the conversion of visit_date to a Date variable assuming it was read in as character # rather than factor, as below: visits0 <- structure(list(unique_id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 4L, 4L), visit_date = c("01/06/2010", "01/01/2011", "01/06/2011", "01/01/2009", "01/06/2009", "01/06/2010", "01/01/2011", "01/07/2011", "01/01/2008", "01/01/2009", "01/01/2010")), .Names = c("unique_id", "visit_date"), row.names = c(NA, -11L), class = "data.frame") str(visits0) within(visits0, { visit_date <- as.Date(visit_date, format = '%m/%d/%Y') last_visit_date <- ave(visit_date, unique_id, FUN = max) } ) # All of the above produce unique_id visit_date last_visit_date 1 1 2010-01-06 2011-01-06 2 1 2011-01-01 2011-01-06 3 1 2011-01-06 2011-01-06 4 2 2009-01-01 2011-01-07 5 2 2009-01-06 2011-01-07 6 2 2010-01-06 2011-01-07 7 2 2011-01-01 2011-01-07 8 2 2011-01-07 2011-01-07 9 3 2008-01-01 2008-01-01 10 4 2009-01-01 2010-01-01 11 4 2010-01-01 2010-01-01 With basic summaries such as this, there are a wealth of options available. mutate() is faster than ddply() for transform operations, data.table can be very fast, especially in large data sets, and the within() statement above shows how to perform the entire task (including conversion to dates) in one fell swoop. Notice that by using within(), one can convert visit_date to a Date object and then use it as input to the next function. You should be able to do this with the mysql package, too, but my SQL programming skills are pretty limited so I'll pass on that one. I made a weak effort but....no. HTH, Dennis On Wed, Aug 24, 2011 at 2:15 PM, Kathleen Rollet <kathleen...@hotmail.com> wrote: > > > > > > Dear R users, > > I am encoutering the following problem: I have a dataset with a 'unique_id' > and different 'visit_date' (formatted as.Date, "%d/%m/%Y") per unique_id. I > would like to create a new variable with the most recent date of visit per > unique_id as shown below. > > unique_id visit_date last_visit_date > 1 01/06/2010 01/06/2011 > 1 01/01/2011 01/06/2011 > 1 01/06/2011 01/06/2011 > 2 01/01/2009 01/07/2011 > 2 01/06/2009 01/07/2011 > 2 01/06/2010 01/07/2011 > 2 01/01/2011 01/07/2011 > 2 01/07/2011 01/07/2011 > 3 01/01/2008 01/01/2008 > 4 01/01/2009 01/01/2010 > 4 01/01/2010 01/01/2010 > > I know the coding to easily do this in Stata, SAS, and Excel but I cannot > find how to do it in R. I try multiple function such as tapply( ), ave( ), > ddply ( ), and transform ( ) after looking into previous postings. The codes > are running but only NA values are generated or I get error messages that the > replacement has less row than the data has (there are about 1000 unique_id > and over 4000 rows in my dataset presently). > I would greatly appreciate if someone could help me. > > Thank you! > > Kathleen R. > Epidemiologist > Montreal, QC, Canada > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.