Re: [R] R dataframe and looping help

arun Mon, 02 Sep 2013 14:07:07 -0700

HI,
You may try this:

dat1<- read.table(text="
CustID TripDate Store Bread Butter Milk Eggs
1 2-Jan-12 a 2 0 2 1 
1 6-Jan-12 c 0 3 3 0 
1 9-Jan-12 a 3 3 0 0
1 31-Mar-13 a 3 0 0 0
2 31-Aug-12 a 0 3 3 0
2 24-Sep-12 a 3 3 0 0
2 25-Sep-12 b 3 0 0 0
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<- dat1[,-c(1:3)]


res<- lapply(seq_len(ncol(dat2)),function(i) 
{x1<-cbind(dat1[,c(1:3)],dat2[,i]);colnames(x1)[4]<- 
colnames(dat2)[i];x2<-x1[x1[,4]!=0,];within(x2, 
{daysbetweentrips<-unlist(tapply(as.Date(x2$TripDate,"%d-%b-%y"),list(x2$CustID),function(x)
 
c(NA,as.numeric(diff(x)))));previoustripstore<-ave(x2$Store,x2$CustID,FUN=function(x)
 c(NA,x[-length(x)]));Nexttripstore<- ave(x2$Store,x2$CustID,FUN=function(x) 
c(x[-1],NA))})})


 res
#[[1]]
 # CustID  TripDate Store Bread Nexttripstore previoustripstore daysbetweentrips
#1      1  2-Jan-12     a     2             a              <NA>               NA
#3      1  9-Jan-12     a     3             a                 a                7
#4      1 31-Mar-13     a     3          <NA>                 a              447
#6      2 24-Sep-12     a     3             b              <NA>               NA
#7      2 25-Sep-12     b     3          <NA>                 a                1

#[[2]]
 # CustID  TripDate Store Butter Nexttripstore previoustripstore
#2      1  6-Jan-12     c      3             a              <NA>
#3      1  9-Jan-12     a      3          <NA>                 c
#5      2 31-Aug-12     a      3             a              <NA>
#6      2 24-Sep-12     a      3          <NA>                 a
 # daysbetweentrips
#2               NA
#3                3
#5               NA
#6               24

#[[3]]
 # CustID  TripDate Store Milk Nexttripstore previoustripstore daysbetweentrips
#1      1  2-Jan-12     a    2             c              <NA>               NA
#2      1  6-Jan-12     c    3          <NA>                 a                4
#5      2 31-Aug-12     a    3          <NA>              <NA>               NA

#[[4]]
 # CustID TripDate Store Eggs Nexttripstore previoustripstore daysbetweentrips
#1      1 2-Jan-12     a    1          <NA>              <NA>               NA



A.K.


Hi, I have a very quick question.. I have a data which has sales per 
category per trip of each customer at different store locations, like 
below..(dataset1 frome xcel attachment) CustID  TripDate        Store   Bread   
Butter  Milk    Eggs
1       2-Jan-12          a     2       0       2       1
1       6-Jan-12          c     0       3       3       0
1       9-Jan-12          a     3       3       0       0
1       31-Mar-13 a     3       0       0       0
2       31-Aug-12 a     0       3       3       0
2       24-Sep-12 a     3       3       0       0
2       25-Sep-12 b     3       0       0       0 Here i have shown 4 items and 
their sales per customer per trip at each 
store... However, my data contains around 100 columns with item names.. 
All i need to do is following: 1. Create a separate dataframe for each item. 
That is, create 100 
dataframs one for each item.. Within the dataframe for Butter, for 
example, will be contained columns 1-3 and Butter column, specifically 
filtered for rows where butter>0 in sales..(so rows 1,4,7 will be 
dropped from this dataframe)..Likewise for all items...(sample output 
for butter is: (dataset2) CustID        TripDate        Store   Butter
1       6-Jan-12           c    3
1       9-Jan-12           a    3
2       31-Aug-12  a    3
2       24-Sep-12  a    3 2. In same loop, create new derived variables within 
each dataframe for 
each item... like create a lag variable for TripDate, create lag 
variable for storename in next trip, storename in previous trip etc... 
and also # days between trips to each store for each customer...(an 
example for Butter dataframe with new derived variables would be...)
Dataset needs to be sorted by CustID, TripDate, Store before creating 
derived variables (dataset3)Book1.xlsx CustID   TripDate        Store   Butter  
NextTripstore previoustripstore 
daysbetweentrips
1       6-Jan-12           c    3       a                     -        -
1       9-Jan-12           a    3       -                     c        -
2       31-Aug-12  a    3       a                     -        -
2       24-Sep-12  a    3       -                     a      24 Point of 
creating multiple item level dataframes is, i will use them 
iteratively as i will perform some regression on these datasets, using 
same set of variables each time

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R dataframe and looping help

Reply via email to