Re: Indexes on heterogeneous rows

aaron morton Thu, 14 Apr 2011 13:26:54 -0700

> (This is a case were 1/3 of the rows are of type 2, but, say only a few 
> hundred rows of type 2 have e=5.)


How many rows would have e=5 without worrying about their type value?
 
Aaron

On 14 Apr 2011, at 23:48, David Boxenhorn wrote:

> Thanks. I'm aware that I can roll my own. I wanted to avoid that, for ease of 
> use, but especially for atomicity concerns. 
> 
> I thought that the secondary index would bring into memory all keys where 
> type=2, and then iterate over them to find keys where=5. (This is a case were 
> 1/3 of the rows are of type 2, but, say only a few hundred rows of type 2 
> have e=5.) The reason why I put "type" first is that queries on type will 
> always be an exact match, whereas the other clauses might be inequalities. 
> 
> On Thu, Apr 14, 2011 at 2:07 PM, aaron morton <aa...@thelastpickle.com> wrote:
> You could make your own inverted index by using keys like  "e=5-type=2" where 
> the columns are either the keys for the object or the objects themselves. 
> Then just grab the full row back. If you know you always want to run queries 
> like that. 
> 
> This recent discussion and blog post from Ed is good background 
> http://www.mail-archive.com/user@cassandra.apache.org/msg12136.html
> 
> I'm not sure how efficient the join from "e" to type would be. AFAIK it will 
> iterate all keys where e=5 and lookup corresponding rows to find out if type 
> = 2. 
> 
> If know how you want to read things back and need to deal with lots-o-data I 
> would start testing with custom indexes. Then compare to the built in ones, 
> it should be reasonably simple add them for a test.   
> 
> Hope that helps. 
> Aaron
>    
> On 14 Apr 2011, at 22:33, David Boxenhorn wrote:
> 
>> Thank you for your answer, and sorry about the sloppy terminology.
>> 
>> I'm thinking of the scenario where there are a small number of results in 
>> the result set, but there are billions of rows in the first of your 
>> secondary indexes.
>> 
>> That is, I want to do something like (not sure of the CQL syntax):
>> 
>> select * where type=2 and e=5
>> 
>> where there are billions of rows of type 2, but some manageable number of 
>> those rows have e=5.
>> 
>> As I understand it, secondary indexes are like column families, where each 
>> value is a column. So the billions of rows where type=2 would go into a 
>> single row of the secondary index. This sounds like a problem to me, is it?  
>> 
>> I'm assuming that the billions of rows that don't have column "e" at all 
>> (those rows of other types) are not a problem at all...
>> 
>> On Thu, Apr 14, 2011 at 12:12 PM, aaron morton <aa...@thelastpickle.com> 
>> wrote:
>> Need to clear up some terminology here. 
>> 
>> Rows have a key and can be retrieved by key. This is *sort of* the primary 
>> index, but not primary in the normal RDBMS sense. 
>> Rows can have different columns and the column names are sorted and can be 
>> efficiently selected.
>> There are "secondary indexes" in cassandra 0.7 based on column values 
>> http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
>> 
>> So you could create secondary indexes on the a,e, and h columns and get rows 
>> that have specific values. There are some limitations to secondary indexes, 
>> read the linked article. 
>> 
>> Or you can make your own secondary indexes using row keys as the index 
>> values.
>> 
>> If you have billions of rows, how many do you need to read back at once?
>> 
>> Hope that helps
>> Aaron
>>     
>> On 14 Apr 2011, at 04:23, David Boxenhorn wrote:
>> 
>>> Is it possible in 0.7.x to have indexes on heterogeneous rows, which have 
>>> different sets of columns?
>>> 
>>> For example, let's say you have three types of objects (1, 2, 3) which each 
>>> had three members. If your rows had the following pattern
>>> 
>>> type=1 a=? b=? c=?
>>> type=2 d=? e=? f=?
>>> type=3 g=? h=? i=?
>>> 
>>> could you index "type" as your primary index, and also index "a", "e", "h" 
>>> as secondary indexes, to get the objects of that type that you are looking 
>>> for?
>>> 
>>> Would it work if you had billions of rows of each type?
>> 
>> 
> 
>

Re: Indexes on heterogeneous rows

Reply via email to