Hello,
If you work with a matrix instead of a data.frame, it usually runs
faster, but your column vectors must all be numeric.
### Fast, but not fast enough
system.time(replicate(500, tmp[which(tmp$id == idList[1]),]))
user system elapsed
0.05 0.00 0.04
### Not fast at all, a big bottleneck
system.time(replicate(500, subset(tmp, id == idList[1])))
user system elapsed
0.07 0.00 0.08
# Make it a matrix and use the matrix
mattmp <- as.matrix(tmp)
system.time(replicate(500, mattmp[which(mattmp[,"id"] == idList[1]),]))
user system elapsed
0.01 0.00 0.01
Hope this helps,
Rui Barradas
Citando Doran, Harold <hdo...@air.org>:
I have an extremely large data frame (~13 million rows) that
resembles the structure of the object tmp below in the reproducible
code. In my real data, the variable, 'id' may or may not be ordered,
but I think that is irrelevant.
I have a process that requires subsetting the data by id and then
running each smaller data frame through a set of functions. One
example below uses indexing and the other uses an explicit call to
subset(), both return the same result, but indexing is faster.
Problem is in my real data, indexing must parse through millions of
rows to evaluate the condition and this is expensive and a
bottleneck in my code. I'm curious if anyone can recommend an
improvement that would somehow be less expensive and faster?
Thank you
Harold
tmp <- data.frame(id = rep(1:200, each = 10), foo = rnorm(2000))
idList <- unique(tmp$id)
### Fast, but not fast enough
system.time(replicate(500, tmp[which(tmp$id == idList[1]),]))
### Not fast at all, a big bottleneck
system.time(replicate(500, subset(tmp, id == idList[1])))
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.