Re: [R] Thoughts for faster indexing

2013-11-26 Thread Steve Lianoglou
Hi, On Tue, Nov 26, 2013 at 11:41 AM, Noah Silverman wrote: > All interesting suggestions. > > I guess a better example of the code would have been a good idea. So, > I'll put a relevant snippet here. > > Rows are cases. There are multiple cases for each ID, marked with a > date. I'm trying to

Re: [R] Thoughts for faster indexing

2013-11-26 Thread Noah Silverman
All interesting suggestions. I guess a better example of the code would have been a good idea. So, I'll put a relevant snippet here. Rows are cases. There are multiple cases for each ID, marked with a date. I'm trying to calculate a time recency weighted score for a covariate, added as a new c

Re: [R] Thoughts for faster indexing

2013-11-21 Thread MacQueen, Don
I have some processes where I do the same thing, iterate over subsets of a data frame. My data frame has ~250,000 rows, 30 variables, and the subsets are such that there are about 6000 of them. Performing a which() statement like yours seems quite fast. For example, wrapping unix.time() around th

Re: [R] Thoughts for faster indexing

2013-11-21 Thread jlh.membership
operations. -Original Message- From: Noah Silverman [mailto:noahsilver...@g.ucla.edu] Sent: Wednesday, November 20, 2013 3:17 PM To: 'R-help@r-project.org' Subject: [R] Thoughts for faster indexing Hello, I have a fairly large data.frame. (About 150,000 rows of 100 variable

Re: [R] Thoughts for faster indexing

2013-11-21 Thread William Dunlap
rom: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of Noah Silverman > Sent: Wednesday, November 20, 2013 12:17 PM > To: 'R-help@r-project.org' > Subject: [R] Thoughts for faster indexing > > Hello, > > I have a fairly large data.

Re: [R] Thoughts for faster indexing

2013-11-21 Thread Bert Gunter
or use tapply(seq_len(nrow(d),d$id,...) or a wrapper version thereof (by, aggregate,...) However, it would not surprise me if this does not help. I suspect that the problem is not what you think but in the code and context you omitted, as others have already noted. -- Bert On Thu, Nov 21, 2

Re: [R] Thoughts for faster indexing

2013-11-21 Thread Ben Bolker
Neal Fultz gmail.com> writes: > > Noah, > > If N is # of rows, k is # of unique IDs > > Using which() is O(N), using which() in a loop is going to be O(Nk); > > sorting the entire data is O(N ln N) and then you can process it in > contiguous blocks, no which required. > > -Neal > You mi

Re: [R] Thoughts for faster indexing

2013-11-21 Thread Carl Witthoft
What the Data Munger Guru said. Plus: this is almost certainly a job for ddply or data.table. Noah Silverman-2 wrote > Hello, > > I have a fairly large data.frame. (About 150,000 rows of 100 > variables.) There are case IDs, and multiple entries for each ID, with a > date stamp. (i.e. records

Re: [R] Thoughts for faster indexing

2013-11-21 Thread Rainer M Krug
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/21/13, 12:34 , Jim Holtman wrote: > you need to show the statement in context with the rest of the > script. you need to tell us what you want to do, not how you want > to do it. Agreed - a few details will result in guesses (see my guess bel

Re: [R] Thoughts for faster indexing

2013-11-21 Thread Neal Fultz
IBCO Software > wdunlap tibco.com > > > > -Original Message- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf > > Of Noah Silverman > > Sent: Wednesday, November 20, 2013 12:17 PM > > To: 'R-help@r-proj

Re: [R] Thoughts for faster indexing

2013-11-21 Thread Ben Tupper
Hi, On Nov 21, 2013, at 10:42 AM, "MacQueen, Don" wrote: > I have some processes where I do the same thing, iterate over subsets of a > data frame. > My data frame has ~250,000 rows, 30 variables, and the subsets are such > that there are about 6000 of them. > > Performing a which() statement l

Re: [R] Thoughts for faster indexing

2013-11-21 Thread Jim Holtman
you need to show the statement in context with the rest of the script. you need to tell us what you want to do, not how you want to do it. Sent from my iPad On Nov 20, 2013, at 15:16, Noah Silverman wrote: > Hello, > > I have a fairly large data.frame. (About 150,000 rows of 100 > variabl

[R] Thoughts for faster indexing

2013-11-20 Thread Noah Silverman
Hello, I have a fairly large data.frame. (About 150,000 rows of 100 variables.) There are case IDs, and multiple entries for each ID, with a date stamp. (i.e. records of peoples activity.) I need to iterate over each person (record ID) in the data set, and then process their data for each date