Well thanks, Martin, and glad to see there is some potential here. This wasn¹t reported as a bug, but as you note really as a question originally and with an invitation to critique my code.
On 3/14/18, 5:11 AM, "Martin Maechler" <maech...@stat.math.ethz.ch> wrote: >>>>>> Henrik Bengtsson <henrik.bengts...@gmail.com> >>>>>> on Tue, 13 Mar 2018 10:12:55 -0700 writes: > >> FYI, in R devel (to become 3.5.0), there's isFALSE() which will cut >> some corners compared to identical(): > >> > microbenchmark::microbenchmark(identical(FALSE, FALSE), >>isFALSE(FALSE)) >> Unit: nanoseconds >> expr min lq mean median uq max neval >> identical(FALSE, FALSE) 984 1138 1694.13 1218.0 1337.5 13584 100 >> isFALSE(FALSE) 713 761 1133.53 809.5 871.5 18619 100 > >> > microbenchmark::microbenchmark(identical(TRUE, FALSE), isFALSE(TRUE)) >> Unit: nanoseconds >> expr min lq mean median uq max neval >> identical(TRUE, FALSE) 1009 1103.5 2228.20 1170.5 1357 14346 100 >> isFALSE(TRUE) 718 760.0 1298.98 798.0 898 17782 100 > >> > microbenchmark::microbenchmark(identical("array", FALSE), >>isFALSE("array")) >> Unit: nanoseconds >> expr min lq mean median uq max neval >> identical("array", FALSE) 975 1058.5 1257.95 1119.5 1250.0 9299 100 >> isFALSE("array") 409 433.5 658.76 446.0 476.5 9383 100 > >Thank you Henrik! > >The speed of the new isTRUE() and isFALSE() is indeed amazing >compared to identical() which was written to be fast itself. > >Note that the new code goes back to a proposal by Hervé Pagès >(of Bioconductor fame) in a thread with R core in April 2017. >The goal of the new code actually *was* to allow call like > > isTRUE(c(a = TRUE)) > >to become TRUE rather than improving speed. >The new source code is at the end of R/src/library/base/R/identical.R > >## NB: is.logical(.) will never dispatch: >## -- base::is.logical(x) <==> typeof(x) == "logical" >isTRUE <- function(x) is.logical(x) && length(x) == 1L && !is.na(x) && x >isFALSE <- function(x) is.logical(x) && length(x) == 1L && !is.na(x) && !x > >and one *reason* this is so fast is that all 6 functions which >are called are primitives : > >> sapply(codetools::findGlobals(isTRUE), function(fn) >>is.primitive(get(fn))) > ! && == is.logical is.na length > TRUE TRUE TRUE TRUE TRUE TRUE > >and a 2nd reason is probably with the many recent improvements of the >byte compiler. > > >> That could probably be used also is sapply(). The difference is that >> isFALSE() is a bit more liberal than identical(x, FALSE), e.g. > >> > isFALSE(c(a = FALSE)) >> [1] TRUE >> > identical(c(a = FALSE), FALSE) >> [1] FALSE > >> Assuming the latter is not an issue, there are 69 places in base R >> where isFALSE() could be used: > >> $ grep -E "identical[(][^,]+,[ ]*FALSE[)]" -r --include="*.R" | grep -F >>"/R/" | wc >> 69 326 5472 > >> and another 59 where isTRUE() can be used: > >> $ grep -E "identical[(][^,]+,[ ]*TRUE[)]" -r --include="*.R" | grep -F >>"/R/" | wc >> 59 307 5021 > >Beautiful use of 'grep' -- thank you for those above, as well. >It does need a quick manual check, but if I use the above grep >from Emacs (via 'M-x grep') or even better via a TAGS table >and M-x tags-query-replace I should be able to do the changes >pretty quickly... and will start looking into that later today. > >Interestingly and to my great pleasure, the first part of the >'Subject' of this mailing list thread, "Possible Improvement", >*has* become true after all -- > >-- thanks to Henrik ! > >Martin Maechler >ETH Zurich > > > >> On Tue, Mar 13, 2018 at 9:21 AM, Doran, Harold <hdo...@air.org> wrote: >> > Quite possibly, and I¹ll look into that. Aside from the work I was >>doing, however, I wonder if there is a way such that sapply could avoid >>the overhead of having to call the identical function to determine the >>conditional path. >> > >> > >> > >> > From: William Dunlap [mailto:wdun...@tibco.com] >> > Sent: Tuesday, March 13, 2018 12:14 PM >> > To: Doran, Harold <hdo...@air.org> >> > Cc: Martin Morgan <martin.mor...@roswellpark.org>; >>r-help@r-project.org >> > Subject: Re: [R] Possible Improvement to sapply >> > >> > Could your code use vapply instead of sapply? vapply forces you to >>declare the type and dimensions >> > of FUN's output and stops if any call to FUN does not match the >>declaration. It can use much less >> > memory and time than sapply because it fills in the output array as >>it goes instead of calling lapply() >> > and seeing how it could be simplified. >> > >> > Bill Dunlap >> > TIBCO Software >> > wdunlap tibco.com<http://tibco.com> >> > >> > On Tue, Mar 13, 2018 at 7:06 AM, Doran, Harold >><hdo...@air.org<mailto:hdo...@air.org>> wrote: >> > Martin >> > >> > In terms of context of the actual problem, sapply is called millions >>of times because the work involves scoring individual students who took >>a test. A score for student A is generated and then student B and such >>and there are millions of students. The psychometric process of scoring >>students is complex and our code makes use of sapply many times for each >>student. >> > >> > The toy example used length just to illustrate, our actual code >>doesn't do that. But your point is well taken, there may be a very good >>counterexample why my proposal doesn't achieve the goal is a >>generalizable way. >> > > > >[.................] > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.