Noah, If N is # of rows, k is # of unique IDs
Using which() is O(N), using which() in a loop is going to be O(Nk); sorting the entire data is O(N ln N) and then you can process it in contiguous blocks, no which required. -Neal On Thu, Nov 21, 2013 at 8:48 AM, William Dunlap <wdun...@tibco.com> wrote: > > The line with the slow process (According to Rprof) is: > > j <- which( d$id == person ) > > (I then process all the records indexed by j, which seems fast enough.) > > Using split() once (and using its output in a loop) instead of == applied > to > a long vector many times, as in > for(j in split(seq_along(d$id), people)) { > # newdata[j,] <- process(data[j,]) > } > is typically faster. But this is the sort of thing that tapply() and the > functions > in package:plyr do for you. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > > -----Original Message----- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf > > Of Noah Silverman > > Sent: Wednesday, November 20, 2013 12:17 PM > > To: 'R-help@r-project.org' > > Subject: [R] Thoughts for faster indexing > > > > Hello, > > > > I have a fairly large data.frame. (About 150,000 rows of 100 > > variables.) There are case IDs, and multiple entries for each ID, with a > > date stamp. (i.e. records of peoples activity.) > > > > > > I need to iterate over each person (record ID) in the data set, and then > > process their data for each date. The processing part is fast, the date > > part is fast. Locating the records is slow. I've even tried using > > data.table, with ID set as the index, and it is still slow. > > > > The line with the slow process (According to Rprof) is: > > > > > > j <- which( d$id == person ) > > > > (I then process all the records indexed by j, which seems fast enough.) > > > > where d is my data.frame or data.table > > > > I thought that using the data.table indexing would speed things up, but > > not in this case. > > > > Any ideas on how to speed this up? > > > > > > Thanks! > > > > -- > > Noah Silverman, M.S., C.Phil > > UCLA Department of Statistics > > 8117 Math Sciences Building > > Los Angeles, CA 90095 > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.