So I have a function that does lapply's for me based on dimension. Currently only works for length(pivotColumns)=2 because I haven't fixed the rbinds. I have two versions. One runs WAYYY faster than the other. And I'm not sure why.
Fast Version: fedb.ddplyWrapper2Fast <- function(data, pivotColumns, listNameFunctions, ...){ lapplyFunctionRecurse <- function(cdata, level=1, ...){ if(level==1){ return(lapply(split(seq(nrow(cdata)),cdata[,pivotColumns[level]], drop=T), function(x) lapplyFunctionRecurse(x, level+1, ...))) } else if (level==length(pivotColumns)) { # return(lapply(split(cdata,data[cdata,pivotColumns[level]], drop=T), function(x, ...) listNameFunctions(data[x,], ...))) return(lapply(split(cdata,data[cdata,pivotColumns[level]], drop=T), function(x, ...) c(data[cdata[1],pivotColumns[2]], data[cdata[1],pivotColumns[1]], sum(data[cdata,"A"], na.rm=T), sum(data[cdata,"A"], na.rm=T)))) } else { return(lapply(split(cdata,data[cdata,pivotColumns[level]], drop=T), function(x) lapplyFunctionRecurse(x, level+1, ...))) } } result = lapplyFunctionRecurse(data, ...) matrix2 <- do.call('rbind', lapply(result, function(x) do.call('rbind',x))) return(matrix2) } dat <- data.frame(D=sample(32000:33000, 666000, T),Fid=sample(1:10,666000,T), A=sample(1:5,666000,T)) > temp = proc.time(); ret = fedb.ddplyWrapper2(dat, c("D", "Fid"), function(x) c(sum(x[,"A"], na.rm=T), sum(x[,"A"], na.rm=T))); proc.time()-temp user system elapsed 4.616 0.006 4.630 #note in thie case the anonymous function I pass in isn't used because I hardcode the function into the lapply. approx 4 seconds This runs very fast. This runs very slow: fedb.ddplyWrapper2 <- function(data, pivotColumns, listNameFunctions, ...){ lapplyFunctionRecurse <- function(cdata, level=1, ...){ if(level==1){ return(lapply(split(seq(nrow(cdata)),cdata[,pivotColumns[level]], drop=T), function(x) lapplyFunctionRecurse(x, level+1, ...))) } else if (level==length(pivotColumns)) { #this line is different. it essentially calls the function you pass in return(lapply(split(cdata,data[cdata,pivotColumns[level]], drop=T), function(x, ...) listNameFunctions(data[x,], ...))) } else { return(lapply(split(cdata,data[cdata,pivotColumns[level]], drop=T), function(x) lapplyFunctionRecurse(x, level+1, ...))) } } result = lapplyFunctionRecurse(data, ...) matrix2 <- do.call('rbind', lapply(result, function(x) do.call('rbind',x))) return(matrix2) } dat <- data.frame(D=sample(32000:33000, 666000, T),Fid=sample(1:10,666000,T), A=sample(1:5,666000,T)) > temp = proc.time(); ret = fedb.ddplyWrapper2(dat, c("D", "Fid"), function(x) c(sum(x[,"A"], na.rm=T), sum(x[,"A"], na.rm=T))); proc.time()-temp user system elapsed 16.346 65.059 81.680 Can anyone explain to me why there is a 4x time difference? I don't want to have to hardcore into the recursion function, but if I have to I will. Thanks, Rob [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.