Hi Tyler, > I've attached 100 rows of a data frame I am working with. > I have one factor, id, with 27 levels. There are two columns of reference > data, x and y (UTM coordinates), one column "date" in POSIXct format, and > one column "diff" in times format (chron package). > > What I am trying to do is as follows: > For each day of the year (date, irrespective of time), select that row for > each id which contains the smallest "diff" value, resulting in an output > containing in general one value per id per day.
There's a basic strategy that makes solving this type of problem much easier. I call it split-apply-combine. The basic idea is that if you had a single day, the problem would be pretty easy: df <- read.csv("http://www.nabble.com/file/p18018170/subdata.csv") oneday <- subset(df, day == "01-01-05") oneday[which.min(oneday$diff), ] # Let's make that into a function to make it easier to apply to all days mindiff <- function(df) df[which.min(df$diff), ] # Now we split up the data frame so that we have a data frame for # each day pieces <- split(df, df$day) # And use lapply to apply that function to each piece: results <- lapply(pieces, mindiff) # Then finally join all the pieces back together df_done <- do.call("rbind", results) So we split the data frame into individual days, picked the correct row for each day, and then joined all the pieces back together. This isn't the most efficient solution, but I think it's easy to see how each part works, and how you can apply it to new situations. If you aren't familiar with lapply or do.call, it's worth having a look at their examples to get a feel for how they work (although for this case you can of course just copy and paste them without caring how they work) Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.