Hi,
On Tue, Nov 26, 2013 at 11:41 AM, Noah Silverman wrote:
> All interesting suggestions.
>
> I guess a better example of the code would have been a good idea. So,
> I'll put a relevant snippet here.
>
> Rows are cases. There are multiple cases for each ID, marked with a
> date. I'm trying to
All interesting suggestions.
I guess a better example of the code would have been a good idea. So,
I'll put a relevant snippet here.
Rows are cases. There are multiple cases for each ID, marked with a
date. I'm trying to calculate a time recency weighted score for a
covariate, added as a new c
I have some processes where I do the same thing, iterate over subsets of a
data frame.
My data frame has ~250,000 rows, 30 variables, and the subsets are such
that there are about 6000 of them.
Performing a which() statement like yours seems quite fast.
For example, wrapping unix.time() around th
operations.
-Original Message-
From: Noah Silverman [mailto:noahsilver...@g.ucla.edu]
Sent: Wednesday, November 20, 2013 3:17 PM
To: 'R-help@r-project.org'
Subject: [R] Thoughts for faster indexing
Hello,
I have a fairly large data.frame. (About 150,000 rows of 100
variable
rom: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf
> Of Noah Silverman
> Sent: Wednesday, November 20, 2013 12:17 PM
> To: 'R-help@r-project.org'
> Subject: [R] Thoughts for faster indexing
>
> Hello,
>
> I have a fairly large data.
or use tapply(seq_len(nrow(d),d$id,...) or a wrapper version
thereof (by, aggregate,...)
However, it would not surprise me if this does not help. I suspect
that the problem is not what you think but in the code and context you
omitted, as others have already noted.
-- Bert
On Thu, Nov 21, 2
Neal Fultz gmail.com> writes:
>
> Noah,
>
> If N is # of rows, k is # of unique IDs
>
> Using which() is O(N), using which() in a loop is going to be O(Nk);
>
> sorting the entire data is O(N ln N) and then you can process it in
> contiguous blocks, no which required.
>
> -Neal
>
You mi
What the Data Munger Guru said.
Plus: this is almost certainly a job for ddply or data.table.
Noah Silverman-2 wrote
> Hello,
>
> I have a fairly large data.frame. (About 150,000 rows of 100
> variables.) There are case IDs, and multiple entries for each ID, with a
> date stamp. (i.e. records
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 11/21/13, 12:34 , Jim Holtman wrote:
> you need to show the statement in context with the rest of the
> script. you need to tell us what you want to do, not how you want
> to do it.
Agreed - a few details will result in guesses (see my guess bel
IBCO Software
> wdunlap tibco.com
>
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf
> > Of Noah Silverman
> > Sent: Wednesday, November 20, 2013 12:17 PM
> > To: 'R-help@r-proj
Hi,
On Nov 21, 2013, at 10:42 AM, "MacQueen, Don" wrote:
> I have some processes where I do the same thing, iterate over subsets of a
> data frame.
> My data frame has ~250,000 rows, 30 variables, and the subsets are such
> that there are about 6000 of them.
>
> Performing a which() statement l
you need to show the statement in context with the rest of the script. you
need to tell us what you want to do, not how you want to do it.
Sent from my iPad
On Nov 20, 2013, at 15:16, Noah Silverman wrote:
> Hello,
>
> I have a fairly large data.frame. (About 150,000 rows of 100
> variabl
Hello,
I have a fairly large data.frame. (About 150,000 rows of 100
variables.) There are case IDs, and multiple entries for each ID, with a
date stamp. (i.e. records of peoples activity.)
I need to iterate over each person (record ID) in the data set, and then
process their data for each date
13 matches
Mail list logo