`$` and `[` are primitives while `[.data.frame` is a longish R function that does all sorts of clever things.

On 19/10/11 22:34, Stavros Macrakis wrote:
I was surprised to find that df$a[1] is an order of magnitude faster than
df[1,"a"]:

df<- data.frame(a=1:10)
system.time(replicate(100000, df$a[3]))
    user  system elapsed
    0.36    0.00    0.36

system.time(replicate(100000, df[3,"a"]))
    user  system elapsed
    4.09    0.00    4.09


A priori, I'd have thought that combining the row and column selections into
a single operation would at worst be equally fast, at best would be faster
by having fewer intermediate results and avoiding redundant operations.

I thought this might be because df[,] builds a data frame before simplifying
it to a vector, but with drop=F, it is even slower, so that doesn't seem to
be the problem:

system.time(replicate(100000, df[3,"a",drop=FALSE]))
    user  system elapsed
   15.00    0.00   14.99


I then wondered if it might be because '[' allows multiple columns and
handles rownames. Sure enough, '[[,]]', which allows only one column, and
does not handle rownames, is almost 3x faster:

system.time(replicate(100000, df[[3,"a"]]))
    user  system elapsed
    1.48    0.00    1.48


...but it is still 4x slower than $[].

The timings are not sensitive to the number of rows in df (except for the
drop=FALSE case, which is much slower for large dfs).  I will be avoiding
[,] and [[,]] when I don't need their functionality, but I still wonder why
they should be so much slower than $[].

             -s

R 2.13.1 on Windows 7, i7-860 (2.8GHz) 8GB RAM

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to