That _is_ interesting. Reduce() calls the sum function at the interpreted level, so I would not expect this. Can you check whether most of the time for my "vectorized" version is spent on the do.call(cbind ...) part, which is what I would guess. Otherwise, this sounds strange, since .rowSums is specifically built for speed -- so it says.. I also assume z is as I constructed.
-- Bert On Mon, Apr 16, 2012 at 3:01 PM, David Winsemius <dwinsem...@comcast.net> wrote: > > On Apr 16, 2012, at 4:32 PM, Bert Gunter wrote: > >> David: >> >> Here is a comparison of the gains to be made by vectorization (again, >> assuming I have interpreted your query correctly) >> >> ## create a list of arrays >>> >>> z <- lapply(seq_len(10000),function(i)array(runif(24),dim=2:4)) >> >> ## Using an apply type approach >>> >>> system.time(ans1 <- array(do.call(mapply,c(sum,z)),dim=2:4)) >> >> user system elapsed >> 0.62 0.00 0.62 >> ## vectorizing via rowSums and cbind >>> >>> system.time(ans2 <-array(rowSums(do.call(cbind,z)),dim=2:4)) >> >> user system elapsed >> 0.02 0.00 0.02 >>> >>> identical(ans1,ans2) >> >> [1] TRUE >> > > It's an example as well for the possibility that different OSes may perform > differently. My Mac (an early 2008 model) is nowhere nearly as efficient > with the second solution, despite being the the same ballpark with the > first: > >> system.time(ans1 <- array(do.call(mapply,c(sum,z)),dim=2:4)) > user system elapsed > 0.841 0.007 0.851 >> system.time(ans2 <-array(rowSums(do.call(cbind,z)),dim=2:4)) > user system elapsed > 0.132 0.003 0.145 > > And on my system .... the Reduce strategy is fastest: > >> system.time(ans3 <- Reduce("+", z) ) > user system elapsed > 0.129 0.001 0.134 > > And ...the Reduce() strategy would preserve other object attributes, > something I'm quite sure the re-dimensioning of rowSums(cbind(.)) could not > preserve. > > L <- list( table(a, sample(a)) , > table(a, sample(a)), > table(a, sample(a)), > table(a, sample(a)), > table(a, sample(a)) ) > > str(Reduce("+", L) ) > 'table' int [1:3, 1:3] 1 1 3 4 0 1 0 4 1 > - attr(*, "dimnames")=List of 2 > ..$ a: chr [1:3] "a" "b" "c" > ..$ : chr [1:3] "a" "b" "c" > > str( array(rowSums(do.call(cbind,L)),dim=c(3,3)) ) > num [1:3, 1:3] 5 5 5 5 5 5 5 5 5 > > > -- David. > > >> Cheers, >> Bert >> >> >> >> On Mon, Apr 16, 2012 at 1:19 PM, David A Vavra <dava...@verizon.net> >> wrote: >>> >>> Thanks Bill, >>> >>> >>> >>> For reasons that aren't important here, I must start from a list. >>> Computing >>> the sum while generating the tables may be a solution but it means doing >>> something in one piece of code that is unrelated to the surrounding code. >>> Bad practice where I'm from. If it's needed it's needed but if I can >>> avoid >>> doing so, I will. >>> >>> >>> >>> I haven't done any timing but because of the extra operations of get and >>> assign, the non-loop implementation will likely suffer. It seems you have >>> shown this to be true. >>> >>> >>> >>> DAV >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: William Dunlap [mailto:wdun...@tibco.com] >>> Sent: Monday, April 16, 2012 3:26 PM >>> To: David A Vavra; 'Bert Gunter' >>> Cc: r-help@r-project.org >>> Subject: RE: [R] Effeciently sum 3d table >>> >>> >>> >>>> Example in partial code: >>> >>> >>>> >>> >>>> Env <- CreatEnv() # my own function >>> >>> >>>> Assign('final',T1-T1,envir=env) >>> >>> >>>> L<-listOfTables >>> >>> >>>> >>> >>>> lapply(L,function(t) { >>> >>> >>>> final <- get('final',envir=env) + t >>> >>> >>>> assign('final',final,envir=env) >>> >>> >>>> NULL >>> >>> >>>> }) >>> >>> >>> >>> >>> First, finish writing that code so it runs and you can make sure its >>> >>> output is ok: >>> >>> >>> >>> L <- lapply(1:50000, function(i) array(i:(i+3), c(2,2))) # list of 50,000 >>> 2x2 matrices >>> >>> env <- new.env() >>> >>> assign('final', L[[1]] - L[[1]], envir=env) >>> >>> junk <- lapply(L, function(t) { >>> >>> final <- get('final', envir=env) + t >>> >>> assign('final', final, envir=env) >>> >>> NULL >>> >>> }) >>> >>> get('final', envir=env) >>> >>> # [,1] [,2] >>> >>> # [1,] 1250025000 1250125000 >>> >>> # [2,] 1250075000 1250175000 >>> >>>> sum( (2:50001) ) # should be final[2,1] >>> >>> >>> # [1] 1250075000 >>> >>> >>> >>> You asked for something less "clunky". >>> >>> You are fighting the system by using get() and assign(), just use >>> >>> ordinary expression syntax to get and set variables: >>> >>> final <- L[[1]] >>> >>> for(i in seq_along(L)[-1]) final <- final + L[[i]] >>> >>> final >>> >>> # [,1] [,2] >>> >>> # [1,] 1250025000 1250125000 >>> >>> # [2,] 1250075000 1250175000 >>> >>> >>> >>> The former took 0.22 seconds on my machine, the latter 0.06. >>> >>> >>> >>> You don't have to compute the whole list of matrices before >>> >>> doing the sum, just add to the current sum when you have >>> >>> computed one matrix and then forget about it. >>> >>> >>> >>> Bill Dunlap >>> >>> Spotfire, TIBCO Software >>> >>> wdunlap tibco.com >>> >>> >>> >>> >>> >>>> -----Original Message----- >>> >>> >>>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >>> >>> On Behalf >>> >>>> Of David A Vavra >>> >>> >>>> Sent: Monday, April 16, 2012 11:35 AM >>> >>> >>>> To: 'Bert Gunter' >>> >>> >>>> Cc: r-help@r-project.org >>> >>> >>>> Subject: Re: [R] Effeciently sum 3d table >>> >>> >>>> >>> >>>> Thanks Gunter, >>> >>> >>>> >>> >>>> I mean what I think is the normal definition of 'sum' as in: >>> >>> >>>> T1 + T2 + T3 + ... >>> >>> >>>> It never occurred to me that there would be a question. >>> >>> >>>> >>> >>>> I have gotten the impression that a for loop is very inefficient. >>>> Whenever >>> >>> I >>> >>>> change them to lapply calls there is a noticeable improvement in run >>>> time >>> >>> >>>> for whatever reason. The problem with lapply here is that I effectively >>> >>> need >>> >>>> a global table to hold the final sum. lapply also wants to return a >>> >>> value. >>> >>>> >>> >>>> You may be correct that in the long run, the loop is the best. There's a >>> >>> lot >>> >>>> of extraneous memory wastage holding all of the tables in a list as well >>> >>> as >>> >>>> the return 'values'. >>> >>> >>>> >>> >>>> As an alternate and given a pre-existing list of tables, I was thinking >>>> of >>> >>> >>>> creating a temporary environment to hold the final result so it could be >>> >>> >>>> passed globally to each lapply execution level but that seems clunky and >>> >>> >>>> wasteful as well. >>> >>> >>>> >>> >>>> Example in partial code: >>> >>> >>>> >>> >>>> Env <- CreatEnv() # my own function >>> >>> >>>> Assign('final',T1-T1,envir=env) >>> >>> >>>> L<-listOfTables >>> >>> >>>> >>> >>>> lapply(L,function(t) { >>> >>> >>>> final <- get('final',envir=env) + t >>> >>> >>>> assign('final',final,envir=env) >>> >>> >>>> NULL >>> >>> >>>> }) >>> >>> >>>> >>> >>>> But I was hoping for a more elegant and hopefully more efficient >>>> solution. >>> >>> >>>> Greg's suggestion for using reduce seems in order but as yet I'm >>> >>> unfamiliar >>> >>>> with the function. >>> >>> >>>> >>> >>>> DAV >>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> -----Original Message----- >>> >>> >>>> From: Bert Gunter [mailto:gunter.ber...@gene.com] >>> >>> >>>> Sent: Monday, April 16, 2012 12:42 PM >>> >>> >>>> To: Greg Snow >>> >>> >>>> Cc: David A Vavra; r-help@r-project.org >>> >>> >>>> Subject: Re: [R] Effeciently sum 3d table >>> >>> >>>> >>> >>>> Define "sum" . Do you mean you want to get a single sum for each >>> >>> >>>> array? -- get marginal sums for each array? -- get a single array in >>> >>> >>>> which each value is the sum of all the individual values at the >>> >>> >>>> position? >>> >>> >>>> >>> >>>> Due thought and consideration for those trying to help by formulating >>> >>> >>>> your query carefully and concisely vastly increases the chance of >>> >>> >>>> getting a useful answer. See the posting guide -- this is a skill that >>> >>> >>>> needs to be learned and the guide is quite helpful. And I must >>> >>> >>>> acknowledge that it is a skill that I also have not yet mastered. >>> >>> >>>> >>> >>>> Concerning your query, I would only note that the two responses from >>> >>> >>>> Greg and Petr that you received are unlikely to be significantly >>> >>> >>>> faster than just using loops, since both are still essentially looping >>> >>> >>>> at the interpreted level. Whether either give you what you want, I do >>> >>> >>>> not know. >>> >>> >>>> >>> >>>> -- Bert >>> >>> >>>> >>> >>>> On Mon, Apr 16, 2012 at 8:53 AM, Greg Snow <538...@gmail.com> wrote: >>> >>> >>>>> Look at the Reduce function. >>> >>> >>>>> >>> >>>>> On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra <dava...@verizon.net> >>> >>> >>>> wrote: >>> >>> >>>>>> I have a large number of 3d tables that I wish to sum >>> >>> >>>>>> Is there an efficient way to do this? Or perhaps a function I can >>>>>> call? >>> >>> >>>>>> >>> >>>>>> I tried using do.call("sum",listoftables) but that returns a single >>> >>> >>>> value. >>> >>> >>>>>> >>> >>>>>> So far, it seems only a loop will do the job. >>> >>> >>>>>> >>> >>>>>> >>> >>>>>> TIA, >>> >>> >>>>>> DAV >>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>> >>>> >>> >>>> Bert Gunter >>> >>> >>>> Genentech Nonclinical Biostatistics >>> >>> >>>> >>> >>>> Internal Contact Info: >>> >>> >>>> Phone: 467-7374 >>> >>> >>>> Website: >>> >>> >>>> >>> >>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost >>> >>>> atistics/pdb-ncb-home.htm >>> >>> >>>> >>> >>>> ______________________________________________ >>> >>> >>>> R-help@r-project.org mailing list >>> >>> >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>> >>> >>>> PLEASE do read the posting guide >>> >>> http://www.R-project.org/posting-guide.html >>> >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> >> -- >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> >> Internal Contact Info: >> Phone: 467-7374 >> Website: >> >> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > David Winsemius, MD > West Hartford, CT > -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.