This should work reasonably well w/ 0.7 indexes. Cassandra tracks statistics on index selectivity, so it would plan that query as "index lookup on e=5, then iterate over those results and return only rows that also have type=2."
On Thu, Apr 14, 2011 at 5:33 AM, David Boxenhorn <da...@taotown.com> wrote: > Thank you for your answer, and sorry about the sloppy terminology. > > I'm thinking of the scenario where there are a small number of results in > the result set, but there are billions of rows in the first of your > secondary indexes. > > That is, I want to do something like (not sure of the CQL syntax): > > select * where type=2 and e=5 > > where there are billions of rows of type 2, but some manageable number of > those rows have e=5. > > As I understand it, secondary indexes are like column families, where each > value is a column. So the billions of rows where type=2 would go into a > single row of the secondary index. This sounds like a problem to me, is it? > > I'm assuming that the billions of rows that don't have column "e" at all > (those rows of other types) are not a problem at all... > > On Thu, Apr 14, 2011 at 12:12 PM, aaron morton <aa...@thelastpickle.com> > wrote: >> >> Need to clear up some terminology here. >> Rows have a key and can be retrieved by key. This is *sort of* the primary >> index, but not primary in the normal RDBMS sense. >> Rows can have different columns and the column names are sorted and can be >> efficiently selected. >> There are "secondary indexes" in cassandra 0.7 based on column >> values http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes >> So you could create secondary indexes on the a,e, and h columns and get >> rows that have specific values. There are some limitations to secondary >> indexes, read the linked article. >> Or you can make your own secondary indexes using row keys as the index >> values. >> If you have billions of rows, how many do you need to read back at once? >> Hope that helps >> Aaron >> >> On 14 Apr 2011, at 04:23, David Boxenhorn wrote: >> >> Is it possible in 0.7.x to have indexes on heterogeneous rows, which have >> different sets of columns? >> >> For example, let's say you have three types of objects (1, 2, 3) which >> each had three members. If your rows had the following pattern >> >> type=1 a=? b=? c=? >> type=2 d=? e=? f=? >> type=3 g=? h=? i=? >> >> could you index "type" as your primary index, and also index "a", "e", "h" >> as secondary indexes, to get the objects of that type that you are looking >> for? >> >> Would it work if you had billions of rows of each type? >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com