> -----Original Message----- > From: Ian Willems [mailto:ian.will...@uz.kuleuven.ac.be] > Sent: Tuesday, October 20, 2009 6:46 AM > To: William Dunlap; r-help@r-project.org > Subject: RE: [R] how to get rid of 2 for-loops and optimize runtime > > Hi William, > > Your programs works perfect and very fast for the table I'm > using right now (only one match per row) > If I want to reuse this code other tables, it can match with > more than one row. > Is it possible to adapt your code easily, if I have to sum > the values of last month from different rows?
You can use aggregate() with one of those keys to sum up the values with a common key value. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > Thank u for your help > regards, > > Ian > > > -----Oorspronkelijk bericht----- > Van: William Dunlap [mailto:wdun...@tibco.com] > Verzonden: maandag 19 oktober 2009 18:08 > Aan: Ian Willems; r-help@r-project.org > Onderwerp: RE: [R] how to get rid of 2 for-loops and optimize runtime > > > > -----Original Message----- > > From: r-help-boun...@r-project.org > > [mailto:r-help-boun...@r-project.org] On Behalf Of Ian Willems > > Sent: Monday, October 19, 2009 6:50 AM > > To: 'r-help@r-project.org' > > Subject: [R] how to get rid of 2 for-loops and optimize runtime > > > > Short: get rid of the loops I use and optimize runtime > > > > Dear all, > > > > I want to calculate for each row the amount of the month ago. > > I use a matrix with 2100 rows and 22 colums (which is still a > > very small matrix. nrows of other matrixes can easily be more > > then 100000) > > > > Table before > > Year month quarter yearmonth Service ... Amount > > 2009 9 Q3 092009 A > > ... 120 > > 2009 9 Q3 092009 B > > ... 80 > > 2009 8 Q3 082009 A > > ... 40 > > 2009 7 Q3 072009 A > > ... 50 > > > > The result I want > > Year month quarter yearmonth Service ... Amount > amound_lastmonth > > 2009 9 Q3 092009 A ... 120 40 > > 2009 9 Q3 092009 B ... 80 ... > > 2009 8 Q3 082009 A ... 40 50 > > 2009 7 Q3 072009 A ... 50 ... > > > > Table is not exactly the same but gives a good idea what I > > have and what I want > > > > The code I have written (see below) does what I want but it > > is very very slow. It takes 129s for 400 rows. And the time > > gets four times higher each time I double the amount of rows. > > I'm new in programming in R, but I found that you can use > > Rprof and summaryRprof to analyse your code (output see below) > > But I don't really understand the output > > I guess I need code that requires linear time and need to get > > rid of the 2 for loops. > > can someone help me or tell me what else I can do to optimize > > my runtime > > > > I use R 2.9.2 > > windows Xp service pack3 > > > > Thank you in advance > > > > Best regards, > > > > Willems Ian > > > > > > ***************************** > > dataset[,5]= month > > dataset[,3]= year > > dataset[,22]= amount > > dataset[,14]= servicetype > > > > [CODE] > > #for each row of the matrix check if each row has.. > > > for (j in 1:Number_rows) { > > + sum<-0 > > + for(i in 1:Number_rows){ > > + if (dataset[j,14]== dataset[i,14]) #..the same service type > > + {if (dataset[j,18]== dataset[i,18]) # .. the same department > > + {if (dataset[j,5]== "1") # if month=1, month ago is > > 12 and year is -1 > > + {if ("12"== dataset[i,5]) > > + {if ((dataset[j,3]-1)== dataset[i,3]) > > + > > + { sum<-sum + dataset[i,22]} > > + }} > > + else { > > + if ((dataset[j,5]-1)== dataset[i,5]) " if month != 1, > > month ago is month -1 > > + { if (dataset[j,3]== dataset[i,3]) > > + {sum<-sum + dataset[i,22]} > > + }}}}}} > > match() is often useful for quickly finding the locations of > many items in a vector. It has no special methods for data.frames > so you must combine the columns of interest into a character > vector of keys and use match on the key vectors. E.g. > > # your test data in a format that mail readers > # can copy and paste into R: > d <- read.table(header=TRUE, textConnection(" > Year month quarter yearmonth Service Amount > 2009 9 Q3 092009 A 120 > 2009 9 Q3 092009 B 80 > 2009 8 Q3 082009 A 40 > 2009 7 Q3 072009 A 50 > ")) > # The key functions > dKey <- function(d) { > with(d, paste(d$Year, d$month, d$Service, sep=";")) > } > keyThisMonth <- function(d)dKey(d) > keyPrevMonth <- function(d) { > stopifnot(!is.null(d$Year), !is.null(d$month), !is.null(d$Service)) > isJan <- d$month==1 > d$Year[isJan] <- d$Year[isJan] - 1 > d$month[isJan] <- 12 > d$month[!isJan] <- d$month[!isJan] - 1 > dKey(d) > } > # Make the new column: > d$AmountPrevMonth <- d$Amount[ match(keyPrevMonth(d), > keyThisMonth(d)) ] > # The result > print(d) > > Year month quarter yearmonth Service Amount AmountPrevMonth > 1 2009 9 Q3 92009 A 120 40 > 2 2009 9 Q3 92009 B 80 NA > 3 2009 8 Q3 82009 A 40 50 > 4 2009 7 Q3 72009 A 50 NA > > This assumes there is only one match per row. Is this the > result you are looking for? > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > > > [\Code] > > > > > summaryRprof() > > $by.self > > self.time self.pct total.time total.pct > > [.data.frame 33.92 26.2 80.90 62.5 > > NextMethod 12.68 9.8 12.68 9.8 > > [.factor 8.60 6.6 18.36 14.2 > > Ops.factor 8.10 6.3 40.08 31.0 > > sort.int 6.82 5.3 13.70 10.6 > > [ 6.70 5.2 85.44 66.0 > > names 6.54 5.1 6.54 5.1 > > length 5.66 4.4 5.66 4.4 > > == 5.04 3.9 44.92 34.7 > > levels 4.80 3.7 5.56 4.3 > > is.na 4.24 3.3 4.24 3.3 > > dim 3.66 2.8 3.66 2.8 > > switch 3.60 2.8 3.80 2.9 > > vector 2.68 2.1 8.02 6.2 > > inherits 1.90 1.5 1.90 1.5 > > any 1.68 1.3 1.68 1.3 > > noNA.levels 1.46 1.1 7.84 6.1 > > .Call 1.40 1.1 1.40 1.1 > > ! 1.26 1.0 1.26 1.0 > > attr<- 1.06 0.8 1.06 0.8 > > .subset 1.00 0.8 1.00 0.8 > > class<- 0.82 0.6 0.82 0.6 > > != 0.80 0.6 0.80 0.6 > > levels.default 0.68 0.5 0.76 0.6 > > all 0.62 0.5 0.62 0.5 > > < 0.54 0.4 0.54 0.4 > > - 0.48 0.4 0.48 0.4 > > is.factor 0.44 0.3 2.34 1.8 > > .subset2 0.38 0.3 0.38 0.3 > > attr 0.36 0.3 0.36 0.3 > > is.character 0.28 0.2 0.28 0.2 > > is.null 0.28 0.2 0.28 0.2 > > | 0.26 0.2 0.26 0.2 > > oldClass<- 0.20 0.2 0.20 0.2 > > is.atomic 0.16 0.1 0.16 0.1 > > nzchar 0.10 0.1 0.10 0.1 > > is.numeric 0.06 0.0 0.06 0.0 > > oldClass 0.06 0.0 0.06 0.0 > > ( 0.04 0.0 0.04 0.0 > > [.data 0.02 0.0 0.02 0.0 > > > > $by.total > > total.time total.pct self.time self.pct > > [ 85.44 66.0 6.70 5.2 > > [.data.frame 80.90 62.5 33.92 26.2 > > == 44.92 34.7 5.04 3.9 > > Ops.factor 40.08 31.0 8.10 6.3 > > [.factor 18.36 14.2 8.60 6.6 > > sort.int 13.70 10.6 6.82 5.3 > > NextMethod 12.68 9.8 12.68 9.8 > > vector 8.02 6.2 2.68 2.1 > > noNA.levels 7.84 6.1 1.46 1.1 > > names 6.54 5.1 6.54 5.1 > > length 5.66 4.4 5.66 4.4 > > levels 5.56 4.3 4.80 3.7 > > is.na 4.24 3.3 4.24 3.3 > > switch 3.80 2.9 3.60 2.8 > > dim 3.66 2.8 3.66 2.8 > > is.factor 2.34 1.8 0.44 0.3 > > inherits 1.90 1.5 1.90 1.5 > > any 1.68 1.3 1.68 1.3 > > .Call 1.40 1.1 1.40 1.1 > > ! 1.26 1.0 1.26 1.0 > > attr<- 1.06 0.8 1.06 0.8 > > .subset 1.00 0.8 1.00 0.8 > > class<- 0.82 0.6 0.82 0.6 > > != 0.80 0.6 0.80 0.6 > > levels.default 0.76 0.6 0.68 0.5 > > all 0.62 0.5 0.62 0.5 > > < 0.54 0.4 0.54 0.4 > > - 0.48 0.4 0.48 0.4 > > .subset2 0.38 0.3 0.38 0.3 > > attr 0.36 0.3 0.36 0.3 > > is.character 0.28 0.2 0.28 0.2 > > is.null 0.28 0.2 0.28 0.2 > > | 0.26 0.2 0.26 0.2 > > oldClass<- 0.20 0.2 0.20 0.2 > > is.atomic 0.16 0.1 0.16 0.1 > > nzchar 0.10 0.1 0.10 0.1 > > is.numeric 0.06 0.0 0.06 0.0 > > oldClass 0.06 0.0 0.06 0.0 > > ( 0.04 0.0 0.04 0.0 > > [.data 0.02 0.0 0.02 0.0 > > > > $sampling.time > > [1] 129.38 > > > > Warning message: > > In readLines(filename, n = chunksize) : > > incomplete final line found on 'Rprof.out' > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.