Re: [R] Indexes on dataframe columns?

Duncan Murdoch Thu, 25 Oct 2007 07:08:34 -0700

On 10/25/2007 9:27 AM, Ranjan Bagchi wrote:
> Hi --
> 
> I'm working with some data frames with fairly high nrows (call it 8 
> columns, by 20,000 rows).  Are there any indexes on these columns?
> 
> When I do a df[df$foo == 42,] [which I think is idiomatic], am I doing a 
> linear 
> search or something better?  If the column contents is ordered, I'd like 
> to at least be doing a naive binary search.


You're not doing a search at all:   you are calculating a vector of TRUE 
and FALSE values, then selecting the rows corresponding to TRUE values. 
  No optimization is done, so it doesn't matter if the values are unique 
or sorted.

20,000 rows is not a particularly large number nowadays, so this may be 
reasonable.    I believe you'll get a fast search if the foo column is 
used as row names, but you'll need to time it to be sure.  Then the 
indexing would be df["42", ].

If it's still too slow, I'd advise against using data frames.  Matrix 
indexing is much faster.

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Indexes on dataframe columns?

Reply via email to