Noah,

If N is # of rows, k is # of unique IDs

Using which() is O(N), using which() in a loop is going to  be O(Nk);

sorting the entire data is O(N ln N) and then you can process it in
contiguous blocks, no which required.

-Neal












On Thu, Nov 21, 2013 at 8:48 AM, William Dunlap <wdun...@tibco.com> wrote:

> > The line with the slow process (According to Rprof) is:
> > j <- which( d$id == person )
> > (I then process all the records indexed by j, which seems fast enough.)
>
> Using split() once (and using its output in a loop) instead of == applied
> to
> a long vector many times, as in
>    for(j in split(seq_along(d$id), people)) {
>        # newdata[j,] <- process(data[j,])
>    }
> is typically faster.  But this is the sort of thing that tapply() and the
> functions
> in package:plyr do for you.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -----Original Message-----
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf
> > Of Noah Silverman
> > Sent: Wednesday, November 20, 2013 12:17 PM
> > To: 'R-help@r-project.org'
> > Subject: [R] Thoughts for faster indexing
> >
> > Hello,
> >
> > I have a fairly large data.frame.  (About 150,000 rows of 100
> > variables.) There are case IDs, and multiple entries for each ID, with a
> > date stamp.  (i.e. records of peoples activity.)
> >
> >
> > I need to iterate over each person (record ID) in the data set, and then
> > process their data for each date.  The processing part is fast, the date
> > part is fast.  Locating the records is slow.  I've even tried using
> > data.table, with ID set as the index, and it is still slow.
> >
> > The line with the slow process (According to Rprof) is:
> >
> >
> > j <- which( d$id == person )
> >
> > (I then process all the records indexed by j, which seems fast enough.)
> >
> > where d is my data.frame or data.table
> >
> > I thought that using the data.table indexing would speed things up, but
> > not in this case.
> >
> > Any ideas on how to speed this up?
> >
> >
> > Thanks!
> >
> > --
> > Noah Silverman, M.S., C.Phil
> > UCLA Department of Statistics
> > 8117 Math Sciences Building
> > Los Angeles, CA 90095
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to