Re: Indexes on heterogeneous rows

2011-04-17 Thread Jonathan Ellis
Right. On Sun, Apr 17, 2011 at 4:23 AM, David Boxenhorn wrote: > Thanks, Jonathan. I think I understand now. > > To sum up: Everything would work, but if your only equality is on "type" > (all the rest inequalities), it could be very inefficient. > > Is that right? > > On Thu, Apr 14, 2011 at 7:2

Re: Indexes on heterogeneous rows

2011-04-17 Thread David Boxenhorn
Thanks, Jonathan. I think I understand now. To sum up: Everything would work, but if your only equality is on "type" (all the rest inequalities), it could be very inefficient. Is that right? On Thu, Apr 14, 2011 at 7:22 PM, Jonathan Ellis wrote: > On Thu, Apr 14, 2011 at 6:48 AM, David Boxenho

Re: Indexes on heterogeneous rows

2011-04-15 Thread Wangpei (Peter)
Boxenhorn; aaron morton 主题: Re: Indexes on heterogeneous rows This should work reasonably well w/ 0.7 indexes. Cassandra tracks statistics on index selectivity, so it would plan that query as "index lookup on e=5, then iterate over those results and return only rows that also have type=2.&quo

Re: Indexes on heterogeneous rows

2011-04-14 Thread aaron morton
> (This is a case were 1/3 of the rows are of type 2, but, say only a few > hundred rows of type 2 have e=5.) How many rows would have e=5 without worrying about their type value? Aaron On 14 Apr 2011, at 23:48, David Boxenhorn wrote: > Thanks. I'm aware that I can roll my own. I wanted to av

Re: Indexes on heterogeneous rows

2011-04-14 Thread Jonathan Ellis
On Thu, Apr 14, 2011 at 6:48 AM, David Boxenhorn wrote: > The reason why I put "type" first is that queries on type will > always be an exact match, whereas the other clauses might be inequalities. Expression order doesn't matter, but as you imply, non-equalities can't be used in an index lookup

Re: Indexes on heterogeneous rows

2011-04-14 Thread Jonathan Ellis
This should work reasonably well w/ 0.7 indexes. Cassandra tracks statistics on index selectivity, so it would plan that query as "index lookup on e=5, then iterate over those results and return only rows that also have type=2." On Thu, Apr 14, 2011 at 5:33 AM, David Boxenhorn wrote: > Thank you

Re: Indexes on heterogeneous rows

2011-04-14 Thread David Boxenhorn
Thanks. I'm aware that I can roll my own. I wanted to avoid that, for ease of use, but especially for atomicity concerns. I thought that the secondary index would bring into memory all keys where type=2, and then iterate over them to find keys where=5. (This is a case were 1/3 of the rows are of t

Re: Indexes on heterogeneous rows

2011-04-14 Thread aaron morton
You could make your own inverted index by using keys like "e=5-type=2" where the columns are either the keys for the object or the objects themselves. Then just grab the full row back. If you know you always want to run queries like that. This recent discussion and blog post from Ed is good b

Re: Indexes on heterogeneous rows

2011-04-14 Thread David Boxenhorn
Thank you for your answer, and sorry about the sloppy terminology. I'm thinking of the scenario where there are a small number of results in the result set, but there are billions of rows in the first of your secondary indexes. That is, I want to do something like (not sure of the CQL syntax): s

Re: Indexes on heterogeneous rows

2011-04-14 Thread aaron morton
Need to clear up some terminology here. Rows have a key and can be retrieved by key. This is *sort of* the primary index, but not primary in the normal RDBMS sense. Rows can have different columns and the column names are sorted and can be efficiently selected. There are "secondary indexes" in