> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Ian Willems > Sent: Monday, October 19, 2009 6:50 AM > To: 'r-help@r-project.org' > Subject: [R] how to get rid of 2 for-loops and optimize runtime > > Short: get rid of the loops I use and optimize runtime > > Dear all, > > I want to calculate for each row the amount of the month ago. > I use a matrix with 2100 rows and 22 colums (which is still a > very small matrix. nrows of other matrixes can easily be more > then 100000) > > Table before > Year month quarter yearmonth Service ... Amount > 2009 9 Q3 092009 A > ... 120 > 2009 9 Q3 092009 B > ... 80 > 2009 8 Q3 082009 A > ... 40 > 2009 7 Q3 072009 A > ... 50 > > The result I want > Year month quarter yearmonth Service ... Amount amound_lastmonth > 2009 9 Q3 092009 A ... 120 40 > 2009 9 Q3 092009 B ... 80 ... > 2009 8 Q3 082009 A ... 40 50 > 2009 7 Q3 072009 A ... 50 ... > > Table is not exactly the same but gives a good idea what I > have and what I want > > The code I have written (see below) does what I want but it > is very very slow. It takes 129s for 400 rows. And the time > gets four times higher each time I double the amount of rows. > I'm new in programming in R, but I found that you can use > Rprof and summaryRprof to analyse your code (output see below) > But I don't really understand the output > I guess I need code that requires linear time and need to get > rid of the 2 for loops. > can someone help me or tell me what else I can do to optimize > my runtime > > I use R 2.9.2 > windows Xp service pack3 > > Thank you in advance > > Best regards, > > Willems Ian > > > ***************************** > dataset[,5]= month > dataset[,3]= year > dataset[,22]= amount > dataset[,14]= servicetype > > [CODE] > #for each row of the matrix check if each row has.. > > for (j in 1:Number_rows) { > + sum<-0 > + for(i in 1:Number_rows){ > + if (dataset[j,14]== dataset[i,14]) #..the same service type > + {if (dataset[j,18]== dataset[i,18]) # .. the same department > + {if (dataset[j,5]== "1") # if month=1, month ago is > 12 and year is -1 > + {if ("12"== dataset[i,5]) > + {if ((dataset[j,3]-1)== dataset[i,3]) > + > + { sum<-sum + dataset[i,22]} > + }} > + else { > + if ((dataset[j,5]-1)== dataset[i,5]) " if month != 1, > month ago is month -1 > + { if (dataset[j,3]== dataset[i,3]) > + {sum<-sum + dataset[i,22]} > + }}}}}}
match() is often useful for quickly finding the locations of many items in a vector. It has no special methods for data.frames so you must combine the columns of interest into a character vector of keys and use match on the key vectors. E.g. # your test data in a format that mail readers # can copy and paste into R: d <- read.table(header=TRUE, textConnection(" Year month quarter yearmonth Service Amount 2009 9 Q3 092009 A 120 2009 9 Q3 092009 B 80 2009 8 Q3 082009 A 40 2009 7 Q3 072009 A 50 ")) # The key functions dKey <- function(d) { with(d, paste(d$Year, d$month, d$Service, sep=";")) } keyThisMonth <- function(d)dKey(d) keyPrevMonth <- function(d) { stopifnot(!is.null(d$Year), !is.null(d$month), !is.null(d$Service)) isJan <- d$month==1 d$Year[isJan] <- d$Year[isJan] - 1 d$month[isJan] <- 12 d$month[!isJan] <- d$month[!isJan] - 1 dKey(d) } # Make the new column: d$AmountPrevMonth <- d$Amount[ match(keyPrevMonth(d), keyThisMonth(d)) ] # The result print(d) Year month quarter yearmonth Service Amount AmountPrevMonth 1 2009 9 Q3 92009 A 120 40 2 2009 9 Q3 92009 B 80 NA 3 2009 8 Q3 82009 A 40 50 4 2009 7 Q3 72009 A 50 NA This assumes there is only one match per row. Is this the result you are looking for? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > [\Code] > > > summaryRprof() > $by.self > self.time self.pct total.time total.pct > [.data.frame 33.92 26.2 80.90 62.5 > NextMethod 12.68 9.8 12.68 9.8 > [.factor 8.60 6.6 18.36 14.2 > Ops.factor 8.10 6.3 40.08 31.0 > sort.int 6.82 5.3 13.70 10.6 > [ 6.70 5.2 85.44 66.0 > names 6.54 5.1 6.54 5.1 > length 5.66 4.4 5.66 4.4 > == 5.04 3.9 44.92 34.7 > levels 4.80 3.7 5.56 4.3 > is.na 4.24 3.3 4.24 3.3 > dim 3.66 2.8 3.66 2.8 > switch 3.60 2.8 3.80 2.9 > vector 2.68 2.1 8.02 6.2 > inherits 1.90 1.5 1.90 1.5 > any 1.68 1.3 1.68 1.3 > noNA.levels 1.46 1.1 7.84 6.1 > .Call 1.40 1.1 1.40 1.1 > ! 1.26 1.0 1.26 1.0 > attr<- 1.06 0.8 1.06 0.8 > .subset 1.00 0.8 1.00 0.8 > class<- 0.82 0.6 0.82 0.6 > != 0.80 0.6 0.80 0.6 > levels.default 0.68 0.5 0.76 0.6 > all 0.62 0.5 0.62 0.5 > < 0.54 0.4 0.54 0.4 > - 0.48 0.4 0.48 0.4 > is.factor 0.44 0.3 2.34 1.8 > .subset2 0.38 0.3 0.38 0.3 > attr 0.36 0.3 0.36 0.3 > is.character 0.28 0.2 0.28 0.2 > is.null 0.28 0.2 0.28 0.2 > | 0.26 0.2 0.26 0.2 > oldClass<- 0.20 0.2 0.20 0.2 > is.atomic 0.16 0.1 0.16 0.1 > nzchar 0.10 0.1 0.10 0.1 > is.numeric 0.06 0.0 0.06 0.0 > oldClass 0.06 0.0 0.06 0.0 > ( 0.04 0.0 0.04 0.0 > [.data 0.02 0.0 0.02 0.0 > > $by.total > total.time total.pct self.time self.pct > [ 85.44 66.0 6.70 5.2 > [.data.frame 80.90 62.5 33.92 26.2 > == 44.92 34.7 5.04 3.9 > Ops.factor 40.08 31.0 8.10 6.3 > [.factor 18.36 14.2 8.60 6.6 > sort.int 13.70 10.6 6.82 5.3 > NextMethod 12.68 9.8 12.68 9.8 > vector 8.02 6.2 2.68 2.1 > noNA.levels 7.84 6.1 1.46 1.1 > names 6.54 5.1 6.54 5.1 > length 5.66 4.4 5.66 4.4 > levels 5.56 4.3 4.80 3.7 > is.na 4.24 3.3 4.24 3.3 > switch 3.80 2.9 3.60 2.8 > dim 3.66 2.8 3.66 2.8 > is.factor 2.34 1.8 0.44 0.3 > inherits 1.90 1.5 1.90 1.5 > any 1.68 1.3 1.68 1.3 > .Call 1.40 1.1 1.40 1.1 > ! 1.26 1.0 1.26 1.0 > attr<- 1.06 0.8 1.06 0.8 > .subset 1.00 0.8 1.00 0.8 > class<- 0.82 0.6 0.82 0.6 > != 0.80 0.6 0.80 0.6 > levels.default 0.76 0.6 0.68 0.5 > all 0.62 0.5 0.62 0.5 > < 0.54 0.4 0.54 0.4 > - 0.48 0.4 0.48 0.4 > .subset2 0.38 0.3 0.38 0.3 > attr 0.36 0.3 0.36 0.3 > is.character 0.28 0.2 0.28 0.2 > is.null 0.28 0.2 0.28 0.2 > | 0.26 0.2 0.26 0.2 > oldClass<- 0.20 0.2 0.20 0.2 > is.atomic 0.16 0.1 0.16 0.1 > nzchar 0.10 0.1 0.10 0.1 > is.numeric 0.06 0.0 0.06 0.0 > oldClass 0.06 0.0 0.06 0.0 > ( 0.04 0.0 0.04 0.0 > [.data 0.02 0.0 0.02 0.0 > > $sampling.time > [1] 129.38 > > Warning message: > In readLines(filename, n = chunksize) : > incomplete final line found on 'Rprof.out' > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.