This is a great example how you cannot figure it out after spending two hours troubleshooting, but a few minutes after you post to R-devel, it's just jumps to you (is there a word for this other than "impatient"?);
Let me answer my own question. The discrepancy between my sum2() code and the internal code for base::sum() is that the latter uses LDOUBLE = long double (on some system it's only double, cf. https://github.com/wch/r-source/blob/trunk/src/nmath/nmath.h#L28-L33), whereas my sum2() code uses double. So using long double, I can reproduce the penalty of having NA_real_ with na.rm=FALSE; > sum3 <- inline::cfunction(sig=c(x="double", narm="logical"), body=' #define LDOUBLE long double double *x_ = REAL(x); int narm_ = asLogical(narm); int n = length(x); LDOUBLE sum = 0.0; for (R_xlen_t i = 0; i < n; i++) { if (!narm_ || !ISNAN(x_[i])) sum += x_[i]; } return ScalarReal((double)sum); ') > x <- rep(0, 1e8) > stopifnot(typeof(x) == "double") > system.time(sum3(x, narm=FALSE)) user system elapsed 0.40 0.00 0.44 > y <- rep(NA_real_, 1e8) > stopifnot(typeof(y) == "double") > system.time(sum3(y, narm=FALSE)) user system elapsed 9.80 0.00 9.84 > z <- x; z[length(z)/2] <- NA_real_ > stopifnot(typeof(z) == "double") > system.time(sum3(z, narm=FALSE)) user system elapsed 4.49 0.00 4.50 This might even be what the following comment refers to: /* Required by C99 but might be slow */ #ifdef HAVE_LONG_DOUBLE # define LDOUBLE long double #else # define LDOUBLE double #endif So now I should rephrase my question: Is there away to avoid this penalty when using 'long double'? Is this something the compiler can be clever about, or is the only solution to not use 'long double'? /Henrik On Sun, May 31, 2015 at 5:02 PM, Henrik Bengtsson <henrik.bengts...@ucsf.edu> wrote: > I'm observing that base::sum(x, na.rm=FALSE) for typeof(x) == "double" > is much more time consuming when there are missing values versus when > there are not. I'm observing this on both Window and Linux, but it's > quite surprising to me. Currently, my main suspect is settings in on > how R was built. The second suspect is my brain. I hope that someone > can clarify the below results and confirm or not whether they see the > same. Note, this is for "doubles", so I'm not expecting > early-stopping as for "integers" (where testing for NA is cheap). > > On R 3.2.0, on Windows (using the official CRAN builds), on Linux > (local built), and on OS X (official AT&T builds), I get: > >> x <- rep(0, 1e8) >> stopifnot(typeof(x) == "double") >> system.time(sum(x, na.rm=FALSE)) > user system elapsed > 0.19 0.01 0.20 > >> y <- rep(NA_real_, 1e8) >> stopifnot(typeof(y) == "double") >> system.time(sum(y, na.rm=FALSE)) > user system elapsed > 9.54 0.00 9.55 > >> z <- x; z[length(z)/2] <- NA_real_ >> stopifnot(typeof(z) == "double") >> system.time(sum(z, na.rm=FALSE)) > user system elapsed > 4.49 0.00 4.51 > > Following the source code, I'm pretty sure the code > (https://github.com/wch/r-source/blob/trunk/src/main/summary.c#L112-L128) > performing the calculation is: > > static Rboolean rsum(double *x, R_xlen_t n, double *value, Rboolean narm) > { > LDOUBLE s = 0.0; > Rboolean updated = FALSE; > for (R_xlen_t i = 0; i < n; i++) { > if (!narm || !ISNAN(x[i])) { > if(!updated) updated = TRUE; > s += x[i]; > } > } > if(s > DBL_MAX) *value = R_PosInf; > else if (s < -DBL_MAX) *value = R_NegInf; > else *value = (double) s; > return updated; > } > > In other words, when na.rm=FALSE, that inner for loop: > > for (R_xlen_t i = 0; i < n; i++) { > if (!narm || !ISNAN(x[i])) { > if(!updated) updated = TRUE; > s += x[i]; > } > } > > should effectively become (because !ISNAN(x[i]) "does not make a difference"): > > for (R_xlen_t i = 0; i < n; i++) { > if (!narm) { > if(!updated) updated = TRUE; > s += x[i]; > } > } > > That is, sum(x, na.rm=FALSE) basically spends time on `s += x[i]`. > Now, I have always been under impression that summing with NA:s is > *not* more expensive that summing over regular (double) values, which > is confirmed by the below example, but the above benchmarking > disagree. It looks like there is a big overhead keeping track of the > sum `s` being NA, which is supported by the fact that summing over 'z' > is costs half of 'y'. > > Now, I *cannot* reproduce the above using the following 'inline' example: > >> sum2 <- inline::cfunction(sig=c(x="double", narm="logical"), body=' > double *x_ = REAL(x); > int narm_ = asLogical(narm); > int n = length(x); > double sum = 0; > for (R_xlen_t i = 0; i < n; i++) { > if (!narm_ || !ISNAN(x_[i])) sum += x_[i]; > } > return ScalarReal(sum); > ') > >> x <- rep(0, 1e8) >> stopifnot(typeof(x) == "double") >> system.time(sum2(x, narm=FALSE)) > user system elapsed > 0.16 0.00 0.16 > >> y <- rep(NA_real_, 1e8) >> stopifnot(typeof(y) == "double") >> system.time(sum2(y, narm=FALSE)) > user system elapsed > 0.16 0.00 0.15 > >> z <- x; z[length(z)/2] <- NA_real_ >> stopifnot(typeof(z) == "double") >> system.time(sum2(z, narm=FALSE)) > user system elapsed > 0.16 0.00 0.15 > > This is why I suspect it's related to how R was configured when it was > built. What's going on? Can someone please bring some light on this? > > Thanks > > Henrik ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel