HI, You may try this: dat1<- read.table(text=" CustID TripDate Store Bread Butter Milk Eggs 1 2-Jan-12 a 2 0 2 1 1 6-Jan-12 c 0 3 3 0 1 9-Jan-12 a 3 3 0 0 1 31-Mar-13 a 3 0 0 0 2 31-Aug-12 a 0 3 3 0 2 24-Sep-12 a 3 3 0 0 2 25-Sep-12 b 3 0 0 0 ",sep="",header=TRUE,stringsAsFactors=FALSE) dat2<- dat1[,-c(1:3)]
res<- lapply(seq_len(ncol(dat2)),function(i) {x1<-cbind(dat1[,c(1:3)],dat2[,i]);colnames(x1)[4]<- colnames(dat2)[i];x2<-x1[x1[,4]!=0,];within(x2, {daysbetweentrips<-unlist(tapply(as.Date(x2$TripDate,"%d-%b-%y"),list(x2$CustID),function(x) c(NA,as.numeric(diff(x)))));previoustripstore<-ave(x2$Store,x2$CustID,FUN=function(x) c(NA,x[-length(x)]));Nexttripstore<- ave(x2$Store,x2$CustID,FUN=function(x) c(x[-1],NA))})}) res #[[1]] # CustID TripDate Store Bread Nexttripstore previoustripstore daysbetweentrips #1 1 2-Jan-12 a 2 a <NA> NA #3 1 9-Jan-12 a 3 a a 7 #4 1 31-Mar-13 a 3 <NA> a 447 #6 2 24-Sep-12 a 3 b <NA> NA #7 2 25-Sep-12 b 3 <NA> a 1 #[[2]] # CustID TripDate Store Butter Nexttripstore previoustripstore #2 1 6-Jan-12 c 3 a <NA> #3 1 9-Jan-12 a 3 <NA> c #5 2 31-Aug-12 a 3 a <NA> #6 2 24-Sep-12 a 3 <NA> a # daysbetweentrips #2 NA #3 3 #5 NA #6 24 #[[3]] # CustID TripDate Store Milk Nexttripstore previoustripstore daysbetweentrips #1 1 2-Jan-12 a 2 c <NA> NA #2 1 6-Jan-12 c 3 <NA> a 4 #5 2 31-Aug-12 a 3 <NA> <NA> NA #[[4]] # CustID TripDate Store Eggs Nexttripstore previoustripstore daysbetweentrips #1 1 2-Jan-12 a 1 <NA> <NA> NA A.K. Hi, I have a very quick question.. I have a data which has sales per category per trip of each customer at different store locations, like below..(dataset1 frome xcel attachment) CustID TripDate Store Bread Butter Milk Eggs 1 2-Jan-12 a 2 0 2 1 1 6-Jan-12 c 0 3 3 0 1 9-Jan-12 a 3 3 0 0 1 31-Mar-13 a 3 0 0 0 2 31-Aug-12 a 0 3 3 0 2 24-Sep-12 a 3 3 0 0 2 25-Sep-12 b 3 0 0 0 Here i have shown 4 items and their sales per customer per trip at each store... However, my data contains around 100 columns with item names.. All i need to do is following: 1. Create a separate dataframe for each item. That is, create 100 dataframs one for each item.. Within the dataframe for Butter, for example, will be contained columns 1-3 and Butter column, specifically filtered for rows where butter>0 in sales..(so rows 1,4,7 will be dropped from this dataframe)..Likewise for all items...(sample output for butter is: (dataset2) CustID TripDate Store Butter 1 6-Jan-12 c 3 1 9-Jan-12 a 3 2 31-Aug-12 a 3 2 24-Sep-12 a 3 2. In same loop, create new derived variables within each dataframe for each item... like create a lag variable for TripDate, create lag variable for storename in next trip, storename in previous trip etc... and also # days between trips to each store for each customer...(an example for Butter dataframe with new derived variables would be...) Dataset needs to be sorted by CustID, TripDate, Store before creating derived variables (dataset3)Book1.xlsx CustID TripDate Store Butter NextTripstore previoustripstore daysbetweentrips 1 6-Jan-12 c 3 a - - 1 9-Jan-12 a 3 - c - 2 31-Aug-12 a 3 a - - 2 24-Sep-12 a 3 - a 24 Point of creating multiple item level dataframes is, i will use them iteratively as i will perform some regression on these datasets, using same set of variables each time ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.