[R] dramatic speed difference in lapply

Rob Forler Fri, 26 Feb 2010 12:00:49 -0800

So I have a function that does lapply's for me based on dimension. Currently
only works for length(pivotColumns)=2 because I haven't fixed the rbinds. I
have two versions. One runs WAYYY faster than the other. And I'm not sure
why.


Fast Version:

fedb.ddplyWrapper2Fast <- function(data, pivotColumns, listNameFunctions,
...){
    lapplyFunctionRecurse <- function(cdata, level=1, ...){
        if(level==1){

return(lapply(split(seq(nrow(cdata)),cdata[,pivotColumns[level]], drop=T),
function(x) lapplyFunctionRecurse(x, level+1, ...)))
        } else if (level==length(pivotColumns)) {
            #
return(lapply(split(cdata,data[cdata,pivotColumns[level]], drop=T),
function(x, ...) listNameFunctions(data[x,], ...)))
            return(lapply(split(cdata,data[cdata,pivotColumns[level]],
drop=T), function(x, ...) c(data[cdata[1],pivotColumns[2]],
data[cdata[1],pivotColumns[1]], sum(data[cdata,"A"], na.rm=T),
sum(data[cdata,"A"], na.rm=T))))
        } else {
            return(lapply(split(cdata,data[cdata,pivotColumns[level]],
drop=T), function(x) lapplyFunctionRecurse(x, level+1, ...)))
        }
    }
    result = lapplyFunctionRecurse(data, ...)
    matrix2 <- do.call('rbind', lapply(result, function(x)
do.call('rbind',x)))
    return(matrix2)
}


dat <- data.frame(D=sample(32000:33000, 666000,
T),Fid=sample(1:10,666000,T), A=sample(1:5,666000,T))
> temp = proc.time(); ret = fedb.ddplyWrapper2(dat, c("D", "Fid"),
function(x) c(sum(x[,"A"], na.rm=T), sum(x[,"A"], na.rm=T)));
proc.time()-temp
   user  system elapsed
 4.616   0.006   4.630
#note in thie case the anonymous function I pass in isn't used because I
hardcode the function into the lapply.

approx 4 seconds

This runs very fast. This runs very slow:

fedb.ddplyWrapper2 <- function(data, pivotColumns, listNameFunctions, ...){
    lapplyFunctionRecurse <- function(cdata, level=1, ...){
        if(level==1){

return(lapply(split(seq(nrow(cdata)),cdata[,pivotColumns[level]], drop=T),
function(x) lapplyFunctionRecurse(x, level+1, ...)))
        } else if (level==length(pivotColumns)) {
            #this line is different. it essentially calls the function you
pass in
            return(lapply(split(cdata,data[cdata,pivotColumns[level]],
drop=T), function(x, ...) listNameFunctions(data[x,], ...)))
        } else {
            return(lapply(split(cdata,data[cdata,pivotColumns[level]],
drop=T), function(x) lapplyFunctionRecurse(x, level+1, ...)))
        }
    }
    result = lapplyFunctionRecurse(data, ...)
    matrix2 <- do.call('rbind', lapply(result, function(x)
do.call('rbind',x)))
    return(matrix2)
}

dat <- data.frame(D=sample(32000:33000, 666000,
T),Fid=sample(1:10,666000,T), A=sample(1:5,666000,T))
> temp = proc.time(); ret = fedb.ddplyWrapper2(dat, c("D", "Fid"),
function(x) c(sum(x[,"A"], na.rm=T), sum(x[,"A"], na.rm=T)));
proc.time()-temp
   user  system elapsed
 16.346  65.059  81.680



Can anyone explain to me why there is a 4x time difference? I don't want to
have to hardcore into the recursion function, but if I have to I will.

Thanks,
Rob

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] dramatic speed difference in lapply

Reply via email to