Re: [R] Creating new variable with maximum visit date by group_id

Dennis Murphy Wed, 24 Aug 2011 22:22:51 -0700

Hi:

Since you tried several functions (reasonably so IMO), here is how
they would work in this problem, in addition to the solutions already
supplied.


Some data massaging before starting, taking your data as input, saved
into an object named visits:

visits <- structure(list(unique_id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 4L, 4L), visit_date = structure(c(14615, 14975, 14980, 14245,
14250, 14615, 14975, 14981, 13879, 14245, 14610), class = "Date")),
.Names = c("unique_id",
"visit_date"), row.names = c(NA, -11L), class = "data.frame")

# plyr package:
ddply(visits, .(unique_id), transform, last_visit_date = max(visit_date))

# Faster version, using the more recent function mutate():
mutate(visits, .(unique_id), last_visit_date = max(visit_date))

# data.table:
library(data.table)
# Create the data table from a data frame, using unique_id as a key:
visDT <- data.table(visits, key = 'unique_id')

#  list() is used to output multiple variables:
visDT[, list(visit_date, last_visit_date = max(visit_date)), by = 'unique_id']

# doBy package:
library(doBy)
transformBy(~ unique_id, data = visits, last_visit_date = max(visit_date))

# base package, using transform():
transform(visits, last_visit_date = ave(visit_date, unique_id, FUN = max))

# ...which you could have gotten from the original data as follows, including
# the conversion of visit_date to a Date variable assuming it was read
in as character
# rather than factor, as below:

visits0 <- structure(list(unique_id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 4L, 4L), visit_date = c("01/06/2010", "01/01/2011", "01/06/2011",
"01/01/2009", "01/06/2009", "01/06/2010", "01/01/2011", "01/07/2011",
"01/01/2008", "01/01/2009", "01/01/2010")), .Names = c("unique_id",
"visit_date"), row.names = c(NA, -11L), class = "data.frame")
str(visits0)

within(visits0, {
     visit_date <- as.Date(visit_date, format = '%m/%d/%Y')
     last_visit_date <- ave(visit_date, unique_id, FUN = max)
       }  )

# All of the above produce
   unique_id visit_date last_visit_date
1          1 2010-01-06      2011-01-06
2          1 2011-01-01      2011-01-06
3          1 2011-01-06      2011-01-06
4          2 2009-01-01      2011-01-07
5          2 2009-01-06      2011-01-07
6          2 2010-01-06      2011-01-07
7          2 2011-01-01      2011-01-07
8          2 2011-01-07      2011-01-07
9          3 2008-01-01      2008-01-01
10         4 2009-01-01      2010-01-01
11         4 2010-01-01      2010-01-01

With basic summaries such as this, there are a wealth of options
available. mutate() is faster than ddply() for transform operations,
data.table can be very fast, especially in large data sets, and the
within() statement above shows how to perform the entire task
(including conversion to dates) in one fell swoop. Notice that by
using within(), one can convert visit_date to a Date object and then
use it as input to the next function.

You should be able to do this with the mysql package, too, but my SQL
programming skills are pretty limited so I'll pass on that one. I made
a weak effort but....no.

HTH,
Dennis


On Wed, Aug 24, 2011 at 2:15 PM, Kathleen Rollet
<kathleen...@hotmail.com> wrote:
>
>
>
>
>
> Dear R users,
>
> I am encoutering the following problem: I have a dataset with a 'unique_id' 
> and different 'visit_date' (formatted as.Date, "%d/%m/%Y") per unique_id. I 
> would like to create a new variable with the most recent date of visit per 
> unique_id as shown below.
>
> unique_id visit_date last_visit_date
> 1  01/06/2010  01/06/2011
> 1  01/01/2011  01/06/2011
> 1  01/06/2011  01/06/2011
> 2  01/01/2009  01/07/2011
> 2  01/06/2009  01/07/2011
> 2  01/06/2010  01/07/2011
> 2  01/01/2011  01/07/2011
> 2  01/07/2011  01/07/2011
> 3  01/01/2008  01/01/2008
> 4  01/01/2009  01/01/2010
> 4  01/01/2010  01/01/2010
>
> I know the coding to easily do this in Stata, SAS, and Excel but I cannot 
> find how to do it in R. I try multiple function such as tapply( ), ave( ), 
> ddply ( ), and transform ( ) after looking into previous postings. The codes 
> are running but only NA values are generated or I get error messages that the 
> replacement has less row than the data has (there are about 1000 unique_id 
> and over 4000 rows in my dataset presently).
> I would greatly appreciate if someone could help me.
>
> Thank you!
>
> Kathleen R.
> Epidemiologist
> Montreal, QC, Canada
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating new variable with maximum visit date by group_id

Reply via email to