On 02/24/2011 05:14 PM, Hadley Wickham wrote:
Note however that I've never seen evidence for a *practical*
difference in simple cases, and also of such cases as part of a
larger computation.
But I'm happy to see one if anyone has an interesting example.
E.g., I would typically never use 0L:100L instead of 0:100
in an R script because I think code readability (and self
explainability) is of considerable importance too.
But : casts to integer anyway:
I know - I just thought that on _this_ thread I ought to write it with L ;-) and
I don't think I write 1L : 100L in real life.
I use the L far more often as a reminder than for performance. Particularly in
function definitions.
str(0:100)
int [1:101] 0 1 2 3 4 5 6 7 8 9 ...
And performance in this case is (obviously) negligible:
library(microbenchmark)
microbenchmark(as.integer(c(0, 100)), times = 1000)
Unit: nanoeconds
min lq median uq max
as.integer(c(0, 100)) 712 791 813 896 15840
(mainly included as opportunity to try out microbenchmark)
So you save ~800 ns but typing two letters probably takes 0.2 s (100
wpm, ~ 5 letters per word + space = 0.1s per letter), so it only saves
you time if you're going to be calling it more than 125000 times ;)
calling 125000 times happens in my real life. I have e.g. one data set with 2e5
spectra (and another batch of that size waiting for me), so anything done "for
each spectrum" reaches this number each time the function is needed.
Also of course, the conversion time goes with the length of the vector.
On the other hand, in > 95 % of the cases taking an hour to think about the
algorithm will have much larger effects ;-).
Also, I notice that the first few measures of microbenchmark are often much
longer (for fast operations). Which may just indicate that the total speed
depends much more on whether the code allows caching or not. And that may mean
that any such coding details may or may not help at all: A single such
conversion may take disproportionally much more time.
I just (yesterday) came across a situation where the difference between numeric
and integer does matter (considering that I do that with ≈ 3e4 x 125 x 6 array
size): as.factor
> microbenchmark (i = as.factor (1:1e3), d = as.factor ((1:1e3)+0.0))
Unit: nanoeconds
min lq median uq max
i 884039 891106 895847 901630 2524877
d 2698637 2770936 2778271 2807572 4266197
but then:
> microbenchmark (
sd = structure ((1:1e3)+0.0, .Label = 1:100, class = "factor"),
si = structure ((1:1e3)+0L, .Label = 1:100, class = "factor"))
Unit: nanoeconds
min lq median uq max
sd 52875 53615 54040 54448 1385422
si 45904 46936 47332 47778 65360
Cheers,
Claudia
--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste
phone: +39 0 40 5 58-37 68
email: cbelei...@units.it
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.