>>>>> Suharto Anggono Suharto Anggono via R-devel >>>>> on Thu, 10 Apr 2025 07:53:04 +0000 (UTC) writes:
> Chain of calls of C functions in coerce.c for as.character(<logical>) in R: > do_asatomic > ascommon > coerceVector > coerceToString > StringFromLogical (for each element) > The definition of 'StringFromLogical' in coerce.c : > Chain of calls of C functions in coerce.c for as.character(<logical>) in R: > > do_asatomic > ascommon > coerceVector > coerceToString > StringFromLogical (for each element) > > The definition of 'StringFromLogical' in coerce.c : > > attribute_hidden SEXP StringFromLogical(int x, int *warn) > { > int w; > formatLogical(&x, 1, &w); > if (x == NA_LOGICAL) return NA_STRING; > else return mkChar(EncodeLogical(x, w)); > } > > The definition of 'EncodeLogical' in printutils.c : > > const char *EncodeLogical(int x, int w) > { > static char buff[NB]; > if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), CHAR(R_print.na_string)); > else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE"); > else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE"); > buff[NB-1] = '\0'; > return buff; > } > > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE) > > system.time(as.character(L)) > user system elapsed > 2.69 0.02 2.73 > > system.time(c("FALSE", "TRUE")[L+1]) > user system elapsed > 0.15 0.04 0.20 > > system.time(c("FALSE", "TRUE")[L+1L]) > user system elapsed > 0.08 0.05 0.13 > > L <- rep(NA, 10^7) > > system.time(as.character(L)) > user system elapsed > 0.11 0.00 0.11 > > system.time(c("FALSE", "TRUE")[L+1]) > user system elapsed > 0.16 0.06 0.22 > > system.time(c("FALSE", "TRUE")[L+1L]) > user system elapsed > 0.09 0.03 0.12 > > `as.character` of a logical vector that is all NA is fast enough. > It appears that the call to 'formatLogical' inside > the C function > 'StringFromLogical' does not introduce much > slowdown. > I found that using string literal inside the C function 'StringFromLogical', by replacing > EncodeLogical(x, w) > with > x ? "TRUE" : "FALSE" > (and the call to 'formatLogical' is not needed anymore), make it faster. indeed! ... and we also notice that the 'w' argument is neither needed anymore, and that makes sense: At this point when you know you have a an R logical value there are only three possibilities and no reason ever to warn about the conversion. > Alternatively, or in addition ! > "fast path" could be introduced in 'EncodeLogical', potentially also benefits format() in R. > For example, without replacing existing code, the following fragment could be inserted. > > if(x == NA_LOGICAL) {if(w == R_print.na_width) return CHAR(R_print.na_string);} > else if(x) {if(w == 4) return "TRUE";} > else {if(w == 5) return "FALSE";} > > However, with either of them, c("FALSE", "TRUE")[L+1L] is still faster than as.character(L) . > > Precomputing or caching possible results of the C function 'StringFromLogical' allows as.character(L) to be as fast as c("FALSE", "TRUE")[L+1L] in R. For example, 'StringFromLogical' could be changed to > > attribute_hidden SEXP StringFromLogical(int x, int *warn) > { > static SEXP TrueCh, FalseCh; > if (x == NA_LOGICAL) return NA_STRING; > else if (x) return TrueCh ? TrueCh : (TrueCh = mkChar("TRUE")); > else return FalseCh ? FalseCh : (FalseCh = mkChar("FALSE")); > } Indeed, and something along this line (storing the other two constant strings) was also my thought when seeing the mkChar(x ? "TRUE" : "FALSE) you implicitly proposed above. I'm looking into applying both speedups; thank you very much, Suharto! Martin -- Martin Maechler ETH Zurich and R Core team ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel