What the Data Munger Guru said. Plus: this is almost certainly a job for ddply or data.table.
Noah Silverman-2 wrote > Hello, > > I have a fairly large data.frame. (About 150,000 rows of 100 > variables.) There are case IDs, and multiple entries for each ID, with a > date stamp. (i.e. records of peoples activity.) > > > I need to iterate over each person (record ID) in the data set, and then > process their data for each date. The processing part is fast, the date > part is fast. Locating the records is slow. I've even tried using > data.table, with ID set as the index, and it is still slow. > > The line with the slow process (According to Rprof) is: > > > j <- which( d$id == person ) > > (I then process all the records indexed by j, which seems fast enough.) > > where d is my data.frame or data.table > > I thought that using the data.table indexing would speed things up, but > not in this case. > > Any ideas on how to speed this up? > > > Thanks! > > -- > Noah Silverman, M.S., C.Phil > UCLA Department of Statistics > 8117 Math Sciences Building > Los Angeles, CA 90095 > > ______________________________________________ > R-help@ > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/Thoughts-for-faster-indexing-tp4680854p4680889.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.