Wouldn't that change how simplify='array' is handled? > str(sapply(1:3, function(x)diag(x,5,2), simplify="array")) int [1:5, 1:2, 1:3] 1 0 0 0 0 0 1 0 0 0 ... > str(sapply(1:3, function(x)diag(x,5,2), simplify=TRUE)) int [1:10, 1:3] 1 0 0 0 0 0 1 0 0 0 ... > str(sapply(1:3, function(x)diag(x,5,2), simplify=FALSE)) List of 3 $ : int [1:5, 1:2] 1 0 0 0 0 0 1 0 0 0 $ : int [1:5, 1:2] 2 0 0 0 0 0 2 0 0 0 $ : int [1:5, 1:2] 3 0 0 0 0 0 3 0 0 0
Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Mar 13, 2018 at 6:23 AM, Doran, Harold <hdo...@air.org> wrote: > While working with sapply, the documentation states that the simplify > argument will yield a vector, matrix etc "when possible". I was curious how > the code actually defined "as possible" and see this within the function > > if (!identical(simplify, FALSE) && length(answer)) > > This seems superfluous to me, in particular this part: > > !identical(simplify, FALSE) > > The preceding code could be reduced to > > if (simplify && length(answer)) > > and it would not need to execute the call to identical in order to trigger > the conditional execution, which is known from the user's simplify = TRUE > or FALSE inputs. I *think* the extra call to identical is just unnecessary > overhead in this instance. > > Take for example, the following toy example code and benchmark results and > a small modification to sapply: > > myList <- list(a = rnorm(100), b = rnorm(100)) > > answer <- lapply(X = myList, FUN = length) > simplify = TRUE > > library(microbenchmark) > > mySapply <- function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE){ > FUN <- match.fun(FUN) > answer <- lapply(X = X, FUN = FUN, ...) > if (USE.NAMES && is.character(X) && is.null(names(answer))) > names(answer) <- X > if (simplify && length(answer)) > simplify2array(answer, higher = (simplify == "array")) > else answer > } > > > > microbenchmark(sapply(myList, length), times = 10000L) > Unit: microseconds > expr min lq mean median uq max neval > sapply(myList, length) 14.156 15.572 16.67603 15.926 16.634 650.46 10000 > > microbenchmark(mySapply(myList, length), times = 10000L) > Unit: microseconds > expr min lq mean median uq max > neval > mySapply(myList, length) 13.095 14.864 16.02964 15.218 15.573 1671.804 > 10000 > > My benchmark timings show a timing improvement with only that small change > made and it is seemingly nominal. In my actual work, the sapply function is > called millions of times and this additional overhead propagates to some > overall additional computing time. > > I have done some limited testing on various real data to verify that the > objects produced under both variants of the sapply (base R and my modified) > yield identical objects when simply is both TRUE or FALSE. > > Perhaps someone else sees a counterexample where my proposed fix does not > cause for sapply to behave as expected. > > Harold > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.