On 02/24/2011 05:14 PM, Hadley Wickham wrote:
Note however that I've never seen evidence for a *practical*
difference in simple cases, and also of such cases as part of a
larger computation.
But I'm happy to see one if anyone has an interesting example.

E.g., I would typically never use  0L:100L  instead of 0:100
in an R script because I think code readability (and self
explainability) is of considerable importance too.

But : casts to integer anyway:
I know - I just thought that on _this_ thread I ought to write it with L ;-) and I don't think I write 1L : 100L in real life.

I use the L far more often as a reminder than for performance. Particularly in function definitions.


str(0:100)
  int [1:101] 0 1 2 3 4 5 6 7 8 9 ...

And performance in this case is (obviously) negligible:

library(microbenchmark)
microbenchmark(as.integer(c(0, 100)), times = 1000)
Unit: nanoeconds
                       min  lq median  uq   max
as.integer(c(0, 100)) 712 791    813 896 15840

(mainly included as opportunity to try out microbenchmark)

So you save ~800 ns but typing two letters probably takes 0.2 s (100
wpm, ~ 5 letters per word + space = 0.1s per letter), so it only saves
you time if you're going to be calling it more than 125000 times ;)
calling 125000 times happens in my real life. I have e.g. one data set with 2e5 spectra (and another batch of that size waiting for me), so anything done "for each spectrum" reaches this number each time the function is needed.
Also of course, the conversion time goes with the length of the vector.
On the other hand, in > 95 % of the cases taking an hour to think about the algorithm will have much larger effects ;-).

Also, I notice that the first few measures of microbenchmark are often much longer (for fast operations). Which may just indicate that the total speed depends much more on whether the code allows caching or not. And that may mean that any such coding details may or may not help at all: A single such conversion may take disproportionally much more time.

I just (yesterday) came across a situation where the difference between numeric and integer does matter (considering that I do that with ≈ 3e4 x 125 x 6 array size): as.factor
> microbenchmark (i = as.factor (1:1e3), d = as.factor ((1:1e3)+0.0))
Unit: nanoeconds
       min      lq  median      uq     max
i   884039  891106  895847  901630 2524877
d  2698637 2770936 2778271 2807572 4266197

but then:
> microbenchmark (
sd = structure ((1:1e3)+0.0, .Label = 1:100, class = "factor"),
si = structure ((1:1e3)+0L, .Label = 1:100, class = "factor"))
Unit: nanoeconds
       min      lq  median      uq     max
sd   52875   53615   54040   54448 1385422
si   45904   46936   47332   47778   65360



Cheers,

Claudia



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to