Hi Joris, The amount of a month ago is normally one value from another row. But I used 'sum<-sum + dataset[i,22]' because I would like to reuse the code also for other tables. In some tables it is possible that the value of last month is the sum of values from different rows.
Thank u for your time Greetings, Ian -----Oorspronkelijk bericht----- Van: joris meys [mailto:jorism...@gmail.com] Verzonden: maandag 19 oktober 2009 16:12 Aan: Ian Willems CC: r-help@r-project.org Onderwerp: Re: [R] how to get rid of 2 for-loops and optimize runtime Hi Ian, first of all, take a look at the functions sapply, mapply, lapply, tapply, ... : they are the more efficient way of implementing loops. Second, could you elaborate a bit further on the data set : the amount of the month ago, is that one value from another row, or the sum of all values in the previous month? I saw in your example dataset that the last month has 2 rows, but couldn't figure out whether that's a typo or really means something. That's necessary information to optimize your code. 129s is indeed far too long for a simple action. Cheers Joris On Mon, Oct 19, 2009 at 3:49 PM, Ian Willems <ian.will...@uz.kuleuven.ac.be> wrote: > Short: get rid of the loops I use and optimize runtime > > Dear all, > > I want to calculate for each row the amount of the month ago. I use a matrix > with 2100 rows and 22 colums (which is still a very small matrix. nrows of > other matrixes can easily be more then 100000) > > Table before > Year month quarter yearmonth Service ... Amount > 2009 9 Q3 092009 A ... 120 > 2009 9 Q3 092009 B ... 80 > 2009 8 Q3 082009 A ... 40 > 2009 7 Q3 072009 A ... 50 > > The result I want > Year month quarter yearmonth Service ... Amount amound_lastmonth > 2009 9 Q3 092009 A ... 120 > 40 > 2009 9 Q3 092009 B ... 80 > ... > 2009 8 Q3 082009 A ... 40 > 50 > 2009 7 Q3 072009 A ... 50 > ... > > Table is not exactly the same but gives a good idea what I have and what I > want > > The code I have written (see below) does what I want but it is very very > slow. It takes 129s for 400 rows. And the time gets four times higher each > time I double the amount of rows. > I'm new in programming in R, but I found that you can use Rprof and > summaryRprof to analyse your code (output see below) > But I don't really understand the output > I guess I need code that requires linear time and need to get rid of the 2 > for loops. > can someone help me or tell me what else I can do to optimize my runtime > > I use R 2.9.2 > windows Xp service pack3 > > Thank you in advance > > Best regards, > > Willems Ian > > > ***************************** > dataset[,5]= month > dataset[,3]= year > dataset[,22]= amount > dataset[,14]= servicetype > > [CODE] > #for each row of the matrix check if each row has.. >> for (j in 1:Number_rows) { > + sum<-0 > + for(i in 1:Number_rows){ > + if (dataset[j,14]== dataset[i,14]) #..the same service type > + {if (dataset[j,18]== dataset[i,18]) # .. the same department > + {if (dataset[j,5]== "1") # if month=1, month ago is 12 and year is > -1 > + {if ("12"== dataset[i,5]) > + {if ((dataset[j,3]-1)== dataset[i,3]) > + > + { sum<-sum + dataset[i,22]} > + }} > + else { > + if ((dataset[j,5]-1)== dataset[i,5]) " if month != 1, month ago is > month -1 > + { if (dataset[j,3]== dataset[i,3]) > + {sum<-sum + dataset[i,22]} > + }}}}}} > > [\Code] > >> summaryRprof() > $by.self > self.time self.pct total.time total.pct > [.data.frame 33.92 26.2 80.90 62.5 > NextMethod 12.68 9.8 12.68 9.8 > [.factor 8.60 6.6 18.36 14.2 > Ops.factor 8.10 6.3 40.08 31.0 > sort.int 6.82 5.3 13.70 10.6 > [ 6.70 5.2 85.44 66.0 > names 6.54 5.1 6.54 5.1 > length 5.66 4.4 5.66 4.4 > == 5.04 3.9 44.92 34.7 > levels 4.80 3.7 5.56 4.3 > is.na 4.24 3.3 4.24 3.3 > dim 3.66 2.8 3.66 2.8 > switch 3.60 2.8 3.80 2.9 > vector 2.68 2.1 8.02 6.2 > inherits 1.90 1.5 1.90 1.5 > any 1.68 1.3 1.68 1.3 > noNA.levels 1.46 1.1 7.84 6.1 > .Call 1.40 1.1 1.40 1.1 > ! 1.26 1.0 1.26 1.0 > attr<- 1.06 0.8 1.06 0.8 > .subset 1.00 0.8 1.00 0.8 > class<- 0.82 0.6 0.82 0.6 > != 0.80 0.6 0.80 0.6 > levels.default 0.68 0.5 0.76 0.6 > all 0.62 0.5 0.62 0.5 > < 0.54 0.4 0.54 0.4 > - 0.48 0.4 0.48 0.4 > is.factor 0.44 0.3 2.34 1.8 > .subset2 0.38 0.3 0.38 0.3 > attr 0.36 0.3 0.36 0.3 > is.character 0.28 0.2 0.28 0.2 > is.null 0.28 0.2 0.28 0.2 > | 0.26 0.2 0.26 0.2 > oldClass<- 0.20 0.2 0.20 0.2 > is.atomic 0.16 0.1 0.16 0.1 > nzchar 0.10 0.1 0.10 0.1 > is.numeric 0.06 0.0 0.06 0.0 > oldClass 0.06 0.0 0.06 0.0 > ( 0.04 0.0 0.04 0.0 > [.data 0.02 0.0 0.02 0.0 > > $by.total > total.time total.pct self.time self.pct > [ 85.44 66.0 6.70 5.2 > [.data.frame 80.90 62.5 33.92 26.2 > == 44.92 34.7 5.04 3.9 > Ops.factor 40.08 31.0 8.10 6.3 > [.factor 18.36 14.2 8.60 6.6 > sort.int 13.70 10.6 6.82 5.3 > NextMethod 12.68 9.8 12.68 9.8 > vector 8.02 6.2 2.68 2.1 > noNA.levels 7.84 6.1 1.46 1.1 > names 6.54 5.1 6.54 5.1 > length 5.66 4.4 5.66 4.4 > levels 5.56 4.3 4.80 3.7 > is.na 4.24 3.3 4.24 3.3 > switch 3.80 2.9 3.60 2.8 > dim 3.66 2.8 3.66 2.8 > is.factor 2.34 1.8 0.44 0.3 > inherits 1.90 1.5 1.90 1.5 > any 1.68 1.3 1.68 1.3 > .Call 1.40 1.1 1.40 1.1 > ! 1.26 1.0 1.26 1.0 > attr<- 1.06 0.8 1.06 0.8 > .subset 1.00 0.8 1.00 0.8 > class<- 0.82 0.6 0.82 0.6 > != 0.80 0.6 0.80 0.6 > levels.default 0.76 0.6 0.68 0.5 > all 0.62 0.5 0.62 0.5 > < 0.54 0.4 0.54 0.4 > - 0.48 0.4 0.48 0.4 > .subset2 0.38 0.3 0.38 0.3 > attr 0.36 0.3 0.36 0.3 > is.character 0.28 0.2 0.28 0.2 > is.null 0.28 0.2 0.28 0.2 > | 0.26 0.2 0.26 0.2 > oldClass<- 0.20 0.2 0.20 0.2 > is.atomic 0.16 0.1 0.16 0.1 > nzchar 0.10 0.1 0.10 0.1 > is.numeric 0.06 0.0 0.06 0.0 > oldClass 0.06 0.0 0.06 0.0 > ( 0.04 0.0 0.04 0.0 > [.data 0.02 0.0 0.02 0.0 > > $sampling.time > [1] 129.38 > > Warning message: > In readLines(filename, n = chunksize) : > incomplete final line found on 'Rprof.out' > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.