On 10/25/2007 9:27 AM, Ranjan Bagchi wrote: > Hi -- > > I'm working with some data frames with fairly high nrows (call it 8 > columns, by 20,000 rows). Are there any indexes on these columns? > > When I do a df[df$foo == 42,] [which I think is idiomatic], am I doing a > linear > search or something better? If the column contents is ordered, I'd like > to at least be doing a naive binary search.
You're not doing a search at all: you are calculating a vector of TRUE and FALSE values, then selecting the rows corresponding to TRUE values. No optimization is done, so it doesn't matter if the values are unique or sorted. 20,000 rows is not a particularly large number nowadays, so this may be reasonable. I believe you'll get a fast search if the foo column is used as row names, but you'll need to time it to be sure. Then the indexing would be df["42", ]. If it's still too slow, I'd advise against using data frames. Matrix indexing is much faster. Duncan Murdoch ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.