> do.call("rbind", > by(df, INDICES=df$ID, FUN=function(DF) tail(DF, 2) ))
Another way to approach this sort of problem is to use ave() to assign a within-group sequence number to each row and then select the rows with the sequence numbers you want. You can also use ave() to make a column giving the size of the group that each item is in. Hence you can select things like "the last 2 items in each category that had at least 3 items". E.g., here is a function to generate data on visits of patients to a clinic, where the visits are listed in time order. makeData <- function(nVisits, Doctors=paste("Dr.",LETTERS[1:2]), Patients=101:104, seed = 1) { if (!is.null(seed)) set.seed(seed) data.frame(Doctor=sample(Doctors, replace=TRUE, nVisits), Patient=sample(Patients, replace=TRUE, nVisits), Date=as.Date("2004-01-01")+sort(sample(2000, replace=TRUE, nVisits))) } # Make a 12-row dataset d <- makeData(12) # Add columns describing the visits between each doctor/patient pair d1 <- within(d, { N=ave(integer(length(Date)), Doctor, Patient, FUN=length) Seq=ave(integer(length(Date)), Doctor, Patient, FUN=seq_along)}) d1 # Doctor Patient Date Seq N # 1 Dr. A 103 2004-01-28 1 3 # 2 Dr. A 102 2005-01-08 1 1 # 3 Dr. B 104 2005-06-19 1 4 # 4 Dr. B 102 2005-11-12 1 2 # 5 Dr. A 103 2006-02-04 2 3 # 6 Dr. B 104 2006-02-12 2 4 # 7 Dr. B 102 2006-08-23 2 2 # 8 Dr. B 104 2006-09-15 3 4 # 9 Dr. B 104 2007-04-15 4 4 # 10 Dr. A 101 2007-08-30 1 2 # 11 Dr. A 103 2008-07-13 3 3 # 12 Dr. A 101 2008-10-06 2 2 # Show the last visit in each doctor/patient group d[d1$Seq==d1$N, ] # Doctor Patient Date # 2 Dr. A 102 2005-01-08 # 7 Dr. B 102 2006-08-23 # 9 Dr. B 104 2007-04-15 # 11 Dr. A 103 2008-07-13 # 12 Dr. A 101 2008-10-06 # Show last 2 visits, but only if there were at least 2 visits d[d1$Seq>d1$N-2 & d1$N>=2, ] # Doctor Patient Date # 4 Dr. B 102 2005-11-12 # 5 Dr. A 103 2006-02-04 # 7 Dr. B 102 2006-08-23 # 8 Dr. B 104 2006-09-15 # 9 Dr. B 104 2007-04-15 # 10 Dr. A 101 2007-08-30 # 11 Dr. A 103 2008-07-13 # 12 Dr. A 101 2008-10-06 # Show the amount of time beteen the last two visits in a group (if there were at least 2 visits) d[d1$Seq==d1$N & d1$N>=2, "Date"] - d[d1$Seq==d1$N-1 & d1$N>=2, "Date"] # Time differences in days # [1] 284 435 667 403 I find it easier to formulate the queries with this method. For large datasets, selecting rows according a criterion can be a lot faster than splitting a data.frame into many parts, processing them with tail, and combining them again. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of David Winsemius > Sent: Thursday, October 11, 2012 2:13 PM > To: bibek sharma > Cc: r-help@r-project.org > Subject: Re: [R] Selecting n observation > > > On Oct 11, 2012, at 12:48 PM, bibek sharma wrote: > > > Hello R help, > > I have a question similar to what is posted by someone before. my > > problem is that Instead of last assessment, I want to choose last two. > > > > I have a data set with several time assessments for each participant. > > I want to select the last assessment for each participant. My dataset > > looks like this: > > ID week outcome > > 1 2 14 > > 1 4 28 > > 1 6 42 > > 4 2 14 > > 4 6 46 > > 4 9 64 > > 4 9 71 > > 4 12 85 > > 9 2 14 > > 9 4 28 > > 9 6 51 > > 9 9 66 > > 9 12 84 > > > > Here is one solution for choosing last assessment > > do.call("rbind", > > by(df, INDICES=df$ID, FUN=function(DF) DF[which.max(DF$week), ])) > > Why wouldn't the solution be something along the lines of: > > do.call("rbind", > by(df, INDICES=df$ID, FUN=function(DF) tail(DF, 2) )) > > > > ID week outcome > > 1 1 6 42 > > 4 4 12 85 > > 9 9 12 84 > > > > > > > David Winsemius, MD > Alameda, CA, USA > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.