Re: Indexes on heterogeneous rows

David Boxenhorn Thu, 14 Apr 2011 04:49:03 -0700

Thanks. I'm aware that I can roll my own. I wanted to avoid that, for ease
of use, but especially for atomicity concerns.


I thought that the secondary index would bring into memory all keys where
type=2, and then iterate over them to find keys where=5. (This is a case
were 1/3 of the rows are of type 2, but, say only a few hundred rows of type
2 have e=5.) The reason why I put "type" first is that queries on type will
always be an exact match, whereas the other clauses might be inequalities.

On Thu, Apr 14, 2011 at 2:07 PM, aaron morton <aa...@thelastpickle.com>wrote:

> You could make your own inverted index by using keys like  "e=5-type=2"
> where the columns are either the keys for the object or the objects
> themselves. Then just grab the full row back. If you know you always want to
> run queries like that.
>
> This recent discussion and blog post from Ed is good background
> http://www.mail-archive.com/user@cassandra.apache.org/msg12136.html
>
> I'm not sure how efficient the join from "e" to type would be. AFAIK it
> will iterate all keys where e=5 and lookup corresponding rows to find out if
> type = 2.
>
> If know how you want to read things back and need to deal with lots-o-data
> I would start testing with custom indexes. Then compare to the built in
> ones, it should be reasonably simple add them for a test.
>
> <http://www.mail-archive.com/user@cassandra.apache.org/msg12136.html>Hope
> that helps.
> Aaron
>
> On 14 Apr 2011, at 22:33, David Boxenhorn wrote:
>
> Thank you for your answer, and sorry about the sloppy terminology.
>
> I'm thinking of the scenario where there are a small number of results in
> the result set, but there are billions of rows in the first of your
> secondary indexes.
>
> That is, I want to do something like (not sure of the CQL syntax):
>
> select * where type=2 and e=5
>
> where there are billions of rows of type 2, but some manageable number of
> those rows have e=5.
>
> As I understand it, secondary indexes are like column families, where each
> value is a column. So the billions of rows where type=2 would go into a
> single row of the secondary index. This sounds like a problem to me, is it?
>
>
> I'm assuming that the billions of rows that don't have column "e" at all
> (those rows of other types) are not a problem at all...
>
> On Thu, Apr 14, 2011 at 12:12 PM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> Need to clear up some terminology here.
>>
>> Rows have a key and can be retrieved by key. This is *sort of* the primary
>> index, but not primary in the normal RDBMS sense.
>> Rows can have different columns and the column names are sorted and can be
>> efficiently selected.
>> There are "secondary indexes" in cassandra 0.7 based on column values
>> http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
>>
>> So you could create secondary indexes on the a,e, and h columns and get
>> rows that have specific values. There are some limitations to secondary
>> indexes, read the linked article.
>>
>> Or you can make your own secondary indexes using row keys as the index
>> values.
>>
>> If you have billions of rows, how many do you need to read back at once?
>>
>> Hope that helps
>> Aaron
>>
>> On 14 Apr 2011, at 04:23, David Boxenhorn wrote:
>>
>> Is it possible in 0.7.x to have indexes on heterogeneous rows, which have
>> different sets of columns?
>>
>> For example, let's say you have three types of objects (1, 2, 3) which
>> each had three members. If your rows had the following pattern
>>
>> type=1 a=? b=? c=?
>> type=2 d=? e=? f=?
>> type=3 g=? h=? i=?
>>
>> could you index "type" as your primary index, and also index "a", "e", "h"
>> as secondary indexes, to get the objects of that type that you are looking
>> for?
>>
>> Would it work if you had billions of rows of each type?
>>
>>
>>
>
>

Re: Indexes on heterogeneous rows

Reply via email to