Hello,

I have a fairly large data.frame.  (About 150,000 rows of 100
variables.) There are case IDs, and multiple entries for each ID, with a
date stamp.  (i.e. records of peoples activity.)


I need to iterate over each person (record ID) in the data set, and then
process their data for each date.  The processing part is fast, the date
part is fast.  Locating the records is slow.  I've even tried using
data.table, with ID set as the index, and it is still slow.

The line with the slow process (According to Rprof) is:


j <- which( d$id == person )

(I then process all the records indexed by j, which seems fast enough.)

where d is my data.frame or data.table

I thought that using the data.table indexing would speed things up, but
not in this case.

Any ideas on how to speed this up?


Thanks!

-- 
Noah Silverman, M.S., C.Phil
UCLA Department of Statistics
8117 Math Sciences Building
Los Angeles, CA 90095

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to