Corrected version. I forgot the the count had to change 'after' eif==1: #Simulated data frame: year from 1990 to 2003, for 5 different ids, each having one or two eif "events" test<-data.frame(year=rep(1990:2003,5),id=gl(5,length(1990:2003)), eif=as.vector(sapply(1:5,function(z){ a<-rep(0,length(1990:2003)) a[sample(1:length(1990:2003),sample(1:2,1))]<-1 a }))) # partition by 'id' and then by 'eif' changes test.new <- do.call(rbind, lapply(split(test, test$id), function(.id){ # now by 'eif' changes do.call(rbind, lapply(split(.id, cumsum(c(0, diff(.id$eif) == -1))), function(.eif){ cbind(.eif, conditional_time=seq(nrow(.eif))) })) }))
On Sat, May 9, 2009 at 1:40 PM, Vincent Arel-Bundock <vincent.a...@gmail.com > wrote: > Hi everyone, > > Please forgive me if my question is simple and my code terrible, I'm new to > R. I am not looking for a ready-made answer, but I would really appreciate > it if someone could share conceptual hints for programming, or point me > toward an R function/package that could speed up my processing time. > > Thanks a lot for your help! > > ## > > My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9 > million id-year observations > > I would like to do 2 things: > > -1- I want to create a 'conditional_time' variable, which increases in > increments of 1 every year, but which resets during year(t) if event 'eif' > occured for this 'id' at year(t-1). It should also reset when we switch to > a > new 'id'. For example: > > dataframe = test > year id eif conditional_time > > 1990 1010 0 1 > 1991 1010 0 2 > 1992 1010 1 3 > 1993 1010 0 1 > 1994 1010 0 2 > 1995 1010 0 3 > 1996 1010 0 4 > 1997 1010 1 5 > 1998 1010 0 1 > 1999 1010 0 2 > 2000 1010 0 3 > 2001 1010 0 4 > 2002 1010 0 5 > 2003 1010 0 6 > 1990 2010 0 1 > 1991 2010 0 2 > 1992 2010 0 3 > 1993 2010 0 4 > 1994 2010 0 5 > 1995 2010 0 6 > 1996 2010 0 7 > 1997 2010 0 8 > 1998 2010 0 9 > 1999 2010 0 10 > 2000 2010 0 11 > 2001 2010 1 12 > 2002 2010 0 1 > 2003 2010 0 2 > > -2- In a copy of the original dataframe, drop all id-year rows that > correspond to years after a given id has experienced his first 'eif' event. > > I have written the code below to take care of -1-, but it is incredibly > inefficient. Given the size of my database, and considering how slow my > computer is, I don't think it's practical to use it. Also, it depends on > correct sorting of the dataframe, which might generate errors. > > ## > > for (i in 1:nrow(test)) { > if (i == 1) { # If first id-year > cond_time <- 1 > test[i, 4] <- cond_time > > } else if ((test[i-1, 1]) != (test[i, 4])) { # If new id > cond_time <- 1 > test[i, 4] <- cond_time > } else { # Same id as previous row > if (test[i, 3] == 0) { > test[i, 4] <- sum(cond_time, 1) > cond_time <- test[i, 6] > } else { > test[i, 4] <- sum(cond_time, 1) > cond_time <- 0 > } > } > } > > -- > Vincent Arel > M.A. Student, McGill University > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.