`$` and `[` are primitives while `[.data.frame` is a longish R function
that does all sorts of clever things.
On 19/10/11 22:34, Stavros Macrakis wrote:
I was surprised to find that df$a[1] is an order of magnitude faster than
df[1,"a"]:
df<- data.frame(a=1:10)
system.time(replicate(100000, df$a[3]))
user system elapsed
0.36 0.00 0.36
system.time(replicate(100000, df[3,"a"]))
user system elapsed
4.09 0.00 4.09
A priori, I'd have thought that combining the row and column selections into
a single operation would at worst be equally fast, at best would be faster
by having fewer intermediate results and avoiding redundant operations.
I thought this might be because df[,] builds a data frame before simplifying
it to a vector, but with drop=F, it is even slower, so that doesn't seem to
be the problem:
system.time(replicate(100000, df[3,"a",drop=FALSE]))
user system elapsed
15.00 0.00 14.99
I then wondered if it might be because '[' allows multiple columns and
handles rownames. Sure enough, '[[,]]', which allows only one column, and
does not handle rownames, is almost 3x faster:
system.time(replicate(100000, df[[3,"a"]]))
user system elapsed
1.48 0.00 1.48
...but it is still 4x slower than $[].
The timings are not sensitive to the number of rows in df (except for the
drop=FALSE case, which is much slower for large dfs). I will be avoiding
[,] and [[,]] when I don't need their functionality, but I still wonder why
they should be so much slower than $[].
-s
R 2.13.1 on Windows 7, i7-860 (2.8GHz) 8GB RAM
[[alternative HTML version deleted]]
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel