> On 15 Dec 2016, at 04:40, Brijesh Mishra <brijeshkmis...@gmail.com> wrote: > > Hi, > > I am trying to calculate growth rate (say, sales, though it is to be > computed for many variables) in a panel data set. Problem is that I > have missing data for many firms for many years. To put it simply, I > have created this short dataframe (original df id much bigger) > > df1<-data.frame(co_code1=rep(c(1100, 1200, 1300), each=7), > fyear1=rep(1990:1996, 3), sales1=rep(seq(1000,1600, by=100),3)) > > # this gives me > co_code1 fyear1 sales1 > 1 1100 1990 1000 > 2 1100 1991 1100 > 3 1100 1992 1200 > 4 1100 1993 1300 > 5 1100 1994 1400 > 6 1100 1995 1500 > 7 1100 1996 1600 > 8 1200 1990 1000 > 9 1200 1991 1100 > 10 1200 1992 1200 > 11 1200 1993 1300 > 12 1200 1994 1400 > 13 1200 1995 1500 > 14 1200 1996 1600 > 15 1300 1990 1000 > 16 1300 1991 1100 > 17 1300 1992 1200 > 18 1300 1993 1300 > 19 1300 1994 1400 > 20 1300 1995 1500 > 21 1300 1996 1600 > > # I am now removing a couple of rows > df1<-df1[-c(5, 8), ] > # the result is > co_code1 fyear1 sales1 > 1 1100 1990 1000 > 2 1100 1991 1100 > 3 1100 1992 1200 > 4 1100 1993 1300 > 6 1100 1995 1500 > 7 1100 1996 1600 > 9 1200 1991 1100 > 10 1200 1992 1200 > 11 1200 1993 1300 > 12 1200 1994 1400 > 13 1200 1995 1500 > 14 1200 1996 1600 > 15 1300 1990 1000 > 16 1300 1991 1100 > 17 1300 1992 1200 > 18 1300 1993 1300 > 19 1300 1994 1400 > 20 1300 1995 1500 > 21 1300 1996 1600 > # so 1994 for co_code1 1100 and 1990 for co_code1 1200 have been > removed. If I try, > d<-ddply(df1,"co_code1",transform, growth=c(NA,exp(diff(log(sales1)))-1)*100) > > # this apparently gives wrong results for the year 1995 (as shown > below) as growth rates are computed considering yearly increment. > > co_code1 fyear1 sales1 growth > 1 1100 1990 1000 NA > 2 1100 1991 1100 10.000000 > 3 1100 1992 1200 9.090909 > 4 1100 1993 1300 8.333333 > 5 1100 1995 1500 15.384615 > 6 1100 1996 1600 6.666667 > 7 1200 1991 1100 NA > 8 1200 1992 1200 9.090909 > 9 1200 1993 1300 8.333333 > 10 1200 1994 1400 7.692308 > 11 1200 1995 1500 7.142857 > 12 1200 1996 1600 6.666667 > 13 1300 1990 1000 NA > 14 1300 1991 1100 10.000000 > 15 1300 1992 1200 9.090909 > 16 1300 1993 1300 8.333333 > 17 1300 1994 1400 7.692308 > 18 1300 1995 1500 7.142857 > 19 1300 1996 1600 6.666667 > # I thought of using the formula only when the increment of fyear1 is > only 1 while in a co_code1, by using this formula > > d<-ddply(df1, > "co_code1", > transform, > if(diff(fyear1)==1){ > growth=(exp(diff(log(df1$sales1)))-1)*100 > } else{ > growth=NA > }) > > But, this doesn't work. I am getting the following error. > > In if (diff(fyear1) == 1) { : > the condition has length > 1 and only the first element will be used > (repeated a few times). > > # I have searched for a solution, but somehow couldn't get one. Hope > that some kind soul will guide me here. >
In your case use ifelse() as explained by Rui. But it can be done more easily since the fyear1 and co_code1 are synchronized. Add a new column to df1 like this df1$growth <- c(NA, ifelse(diff(df1$fyear1)==1, (exp(diff(log(df1$sales1)))-1)*100, NA ) ) and display df1. From your request I cannot determine if this is what you want. regards, Berend Hasselman ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.