Re: [R] how to subset based on other row values and multiplicity

2014-07-17 Thread arun
Hi Bill, Modifying `f2` seems to solve the problem. f2 <- function (data) {     library(dplyr)     data%>%     group_by(id, value) %>%     mutate(date=as.Date(date))%>%     arrange(date) %>%     filter(indx =any(c(abs(diff(date)),NA) >31)& date==min(date)) %>%     filter(row_number()==1) }

Re: [R] how to subset based on other row values and multiplicity

2014-07-16 Thread William Dunlap
> filter(any(c(abs(diff(as.Date(date))),NA)>31)& date == min(date)) Note that the 'date == min(date)' will cause superfluous output rows when there are several readings on initial date for a given id/value pair. E.g., > dat1 <- data.frame(stringsAsFactors=FALSE, id=rep("A", 4), value=rep("x", 4)

Re: [R] how to subset based on other row values and multiplicity

2014-07-16 Thread William Dunlap
Using base R you can solve this by doing some sorting and comparing the first and last dates in each id-value group. Computing the last and last dates can be vectorized. f1 <- function(data) { # sort by id, break ties with value, break remaining ties with date sortedData <- data[with(data

Re: [R] how to subset based on other row values and multiplicity

2014-07-16 Thread arun
Hi, If `dat` is the dataset library(dplyr) dat%>% group_by(id,value)%>% arrange(date=as.Date(date))%>% filter(any(c(abs(diff(as.Date(date))),NA)>31)& date == min(date)) #Source: local data frame [3 x 3] #Groups: id, value # #  id   date value #1  a 2000-01-01 x #2  c 2000-09-10 y #3 

Re: [R] how to subset based on other row values and multiplicity

2014-07-16 Thread John McKown
On Wed, Jul 16, 2014 at 8:51 AM, jim holtman wrote: > I can reproduce what you requested, but there was the question about > what happens with the multiple 'c-y' values. > > > >> require(data.table) >> x <- read.table(text = 'id date value > + a2000-01-01 x > + a2000

Re: [R] how to subset based on other row values and multiplicity

2014-07-16 Thread Williams Scott
Thanks guys - amazingly prompt solutions from the R community as always. Yes, the c-y value reverts to just the first date event - the spirit of this is that I am trying to identify and confirm a list of diagnoses that a patient has coded in government administrative data. Once a diagnosis is made

Re: [R] how to subset based on other row values and multiplicity

2014-07-16 Thread John McKown
Thanks. So you only want a single entry with a given "id" & "value", even if there are multiple possible confirmations. Too bad about not being in an SQL data base. I've already partially solved the problem using PostgreSQL. Just in case you, or others, might be interested, below is a transcript o

Re: [R] how to subset based on other row values and multiplicity

2014-07-16 Thread jim holtman
I can reproduce what you requested, but there was the question about what happens with the multiple 'c-y' values. > require(data.table) > x <- read.table(text = 'id date value + a2000-01-01 x + a2000-03-01 x + b2000-11-11 w + c2000-11-11 y + c2000-10-01

Re: [R] how to subset based on other row values and multiplicity

2014-07-16 Thread John McKown
On Wed, Jul 16, 2014 at 8:07 AM, Williams Scott wrote: > Hi R experts, > > I have a dataset as sampled below. Values are only regarded as Œconfirmed¹ > in an individual (Œid¹) if they occur > more than once at least 30 days apart. > > > id date value > a2000-01-01 x > a2000-03-01 x > b

[R] how to subset based on other row values and multiplicity

2014-07-16 Thread Williams Scott
Hi R experts, I have a dataset as sampled below. Values are only regarded as Œconfirmed¹ in an individual (Œid¹) if they occur more than once at least 30 days apart. id date value a2000-01-01 x a2000-03-01 x b2000-11-11 w c2000-11-11 y c2000-10-01 y c2000-09-10 y c