Hi Bill, Modifying `f2` seems to solve the problem.
f2 <- function (data) { library(dplyr) data%>% group_by(id, value) %>% mutate(date=as.Date(date))%>% arrange(date) %>% filter(indx =any(c(abs(diff(date)),NA) >31)& date==min(date)) %>% filter(row_number()==1) } f2(dat) Source: local data frame [3 x 3] Groups: id, value id date value 1 a 2000-01-01 x 2 c 2000-09-10 y 3 c 2000-10-11 z f2(dat1) Source: local data frame [1 x 3] Groups: id, value id value date 1 A x 2000-10-02 A.K. On Wednesday, July 16, 2014 4:25 PM, William Dunlap <wdun...@tibco.com> wrote: > filter(any(c(abs(diff(as.Date(date))),NA)>31)& date == min(date)) Note that the 'date == min(date)' will cause superfluous output rows when there are several readings on initial date for a given id/value pair. E.g., > dat1 <- data.frame(stringsAsFactors=FALSE, id=rep("A", 4), value=rep("x", 4), > date=as.Date("2000-10-1")+c(1,1,50,50)) > f2(dat1) # want 1 output row: A, x, 2000-10-2 Source: local data frame [2 x 3] Groups: id, value id value date 1 A x 2000-10-02 2 A x 2000-10-02 where f2 is your code wrapped up in a function (to make testing and use easier) f2 <- function (data) { library(dplyr) data %>% group_by(id, value) %>% arrange(date = as.Date(date)) %>% filter(any(c(abs(diff(as.Date(date))), NA) > 31) & date == min(date)) } Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jul 16, 2014 at 7:49 AM, arun <smartpink...@yahoo.com> wrote: > Hi, > If `dat` is the dataset > > library(dplyr) > dat%>% > group_by(id,value)%>% > > arrange(date=as.Date(date))%>% > filter(any(c(abs(diff(as.Date(date))),NA)>31)& date == min(date)) > #Source: local data frame [3 x 3] > #Groups: id, value > # > # id date value > #1 a 2000-01-01 x > #2 c 2000-09-10 y > #3 c 2000-10-11 z > A.K. > > > > > On Wednesday, July 16, 2014 9:10 AM, Williams Scott > <scott.willi...@petermac.org> wrote: > Hi R experts, > > I have a dataset as sampled below. Values are only regarded as Œconfirmed¹ > in an individual (Œid¹) if they occur > more than once at least 30 days apart. > > > id date value > a 2000-01-01 x > a 2000-03-01 x > b 2000-11-11 w > c 2000-11-11 y > c 2000-10-01 y > c 2000-09-10 y > c 2000-12-12 z > c 2000-10-11 z > d 2000-11-11 w > d 2000-11-10 w > > > I wish to subset the data to retain rows where the value for the > individual is confirmed more than 30 days apart. So, after deleting all > rows with just one occurrence of id and value, the rest would be the > earliest occurrence of each value in each case id, provided 31 or more > days exist between the dates. If >1 value is present per id, each value > level needs to be assessed independently. This example would then reduce > to: > > > id date value > a 2000-01-01 x > c 2000-09-10 y > c 2000-10-11 z > > > > I can do this via some crude loops and subsetting, but I am looking for as > much efficiency as possible > as the dataset has around 50 million rows to assess. Any suggestions > welcomed. > > Thanks in advance > > Scott Williams MD > Melbourne, Australia > > > > This email (including any attachments or links) may contain > confidential and/or legally privileged information and is > intended only to be read or used by the addressee. If you > are not the intended addressee, any use, distribution, > disclosure or copying of this email is strictly > prohibited. > Confidentiality and legal privilege attached to this email > (including any attachments) are not waived or lost by > reason of its mistaken delivery to you. > If you have received this email in error, please delete it > and notify us immediately by telephone or email. Peter > MacCallum Cancer Centre provides no guarantee that this > transmission is free of virus or that it has not been > intercepted or altered and will not be liable for any delay > in its receipt. > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.