That will teach me to post without a double-check. On 09/05/09 3:11 PM, "Finak Greg" <greg.fi...@ircm.qc.ca> wrote:
Assuming the year column has complete data and doesn't skip a year, the following should take care of 1) #Simulated data frame: year from 1990 to 2003, for 5 different ids, each having one or two eif "events" test<-data.frame(year=rep(1990:2003,5),id=gl(5,length(1990:2003)),eif=as.vector(sapply(1:5,function(z){a<-rep(0,length(1990:2003));a[sample(1:length(1990:2003),sample(1:2,1))]<-1;a}))) #Generate the "conditional_time" column. test<-do.call("rbind",lapply(split(test,test$id),function(z){s<-0;data.frame(z,cond_time=sapply(z$eif,function(i)ifelse(i==1,s<-1,s<<-s+1)))})) The above resets the count at eif==1 rather than after, and there's a local assignment to s which should be global. Thanks, David, for noting that. do.call("rbind",lapply(split(test,test$id),function(z){s<-0;data.frame(z,cond_time=sapply(z$eif,function(i)ifelse(i==1,{l<-s+1;s<<-0;l},{l<-s+1;s<<-s+1;l})))})) Generally sapply, lapply, and apply are faster than "for" loops. split() will split your data frame by the $id column (second argument). lapply() loops through the resulting list and generates the cond_time variable, resetting when eif==1, otherwise incrementing the count, much as you have in your code. If I understand 2) correctly, the following should do the trick: test2<-test; #copy the data frame test2<-do.call("rbind",lapply(split(test,test$id),function(z)z[1:which(z$eif==1)[1],])) Similar to the former, but sub-setting the rows of the data data frame up to the first event, for each id. If the above is all you need, then 1) and 2) could be combined in a single call. Others will likely have a different approach.. Cheers, -- Greg Finak Post-Doctoral Research Associate Computational Biology Unit Institut des Recherches Cliniques de Montreal Montreal, QC. On 09/05/09 1:40 PM, "Vincent Arel-Bundock" <vincent.a...@gmail.com> wrote: Hi everyone, Please forgive me if my question is simple and my code terrible, I'm new to R. I am not looking for a ready-made answer, but I would really appreciate it if someone could share conceptual hints for programming, or point me toward an R function/package that could speed up my processing time. Thanks a lot for your help! ## My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9 million id-year observations I would like to do 2 things: -1- I want to create a 'conditional_time' variable, which increases in increments of 1 every year, but which resets during year(t) if event 'eif' occured for this 'id' at year(t-1). It should also reset when we switch to a new 'id'. For example: dataframe = test year id eif conditional_time 1990 1010 0 1 1991 1010 0 2 1992 1010 1 3 1993 1010 0 1 1994 1010 0 2 1995 1010 0 3 1996 1010 0 4 1997 1010 1 5 1998 1010 0 1 1999 1010 0 2 2000 1010 0 3 2001 1010 0 4 2002 1010 0 5 2003 1010 0 6 1990 2010 0 1 1991 2010 0 2 1992 2010 0 3 1993 2010 0 4 1994 2010 0 5 1995 2010 0 6 1996 2010 0 7 1997 2010 0 8 1998 2010 0 9 1999 2010 0 10 2000 2010 0 11 2001 2010 1 12 2002 2010 0 1 2003 2010 0 2 -2- In a copy of the original dataframe, drop all id-year rows that correspond to years after a given id has experienced his first 'eif' event. I have written the code below to take care of -1-, but it is incredibly inefficient. Given the size of my database, and considering how slow my computer is, I don't think it's practical to use it. Also, it depends on correct sorting of the dataframe, which might generate errors. ## for (i in 1:nrow(test)) { if (i == 1) { # If first id-year cond_time <- 1 test[i, 4] <- cond_time } else if ((test[i-1, 1]) != (test[i, 4])) { # If new id cond_time <- 1 test[i, 4] <- cond_time } else { # Same id as previous row if (test[i, 3] == 0) { test[i, 4] <- sum(cond_time, 1) cond_time <- test[i, 6] } else { test[i, 4] <- sum(cond_time, 1) cond_time <- 0 } } } -- Vincent Arel M.A. Student, McGill University [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.