embedded response, way down below...

On 2010-04-26, at 12:56 PM, Ryan King wrote:

> On Sun, Apr 25, 2010 at 11:14 AM, Bob Hutchison
> <hutch-li...@recursive.ca> wrote:
>> 
>> Hi,
>> 
>> I'm new to Cassandra and trying to work out how to do something that I've 
>> implemented any number of times (e.g. TokyoCabinet, Perst, even the 
>> filesystem using grep :-) I've managed to get some of this working in 
>> Cassandra but not all.
>> 
>> So here's the core of the situation.
>> 
>> I have this opaque chunk of data that I want to store in Cassandra and then 
>> find it again.
>> 
>> I can generate a key when the data is created very easily, and I've stored 
>> it in a straight forward manner: in a column with a key whose value is the 
>> data. And I can retrieve it when I know the key. No difficulties here at 
>> all, works fine.
>> 
>> Now I want to index this data taking what I imagine to be a pretty typical 
>> approach.
>> 
>> Lets say there's two many-to-one indexes: 'colour', and 'size'. Each colour 
>> value will have more than one chunk of data, same for size.
>> 
>> What I thought I'd do is make a super column and index the chunk of data 
>> kind of like: { 'colour' => { 'blue' => 1 }, 'size' => { 'large' => 1}} with 
>> the key equal to the key of the chunk of data. And Cassandra stores it 
>> without error like that. So using the Ruby gem, it'd be something along the 
>> lines of:
>> 
>> cassandra.insert(:Indexes, key-of-the-chunk-of-data, { 'colour' => { 'blue' 
>> => 1 }, 'size' => { 'large' => 1 } })
>> 
>> Q1: is this a reasonable approach? It *seems* to be what I've read is 
>> supposed to be done. The 1 is meaningless. Anyway, it executes without error 
>> in Ruby.
> 
> No. In order to index your data, you need to invert it. Since you're
> working in ruby I'd recommend CassandraObject:
> http://github.com/nzKoz/cassandra_object. It has indexing built in.

Thanks Ryan. I don't really want to add a lot of layers of abstraction here, 
since what I'm writing is itself an abstraction. Worse, I can't get 
cassandra_object to install, some kind of gem issue. Anyway...

I dusted off my 20-years-ago experience with python (i.e. with the help of 
google), downloaded and installed pycassa (and thrift itself) and played around 
a bit. I find that the following python/pycassa snippet works just fine (or 
well enough).

import pycassa

client = pycassa.connect()
indexes_scf = pycassa.ColumnFamily(client, 'Play', 'Indexes', super=True)
rows = list(indexes_scf.get_range(column_start='blue', column_finish='blue', 
super_column='colour'))

The data was inserted using Ruby, but not read, because, as I said (below now), 
I don't know how to write the equivalent to the indexes_scf.get_range call in 
the snippet. So a simpler question, how do you write the equivalent to that in 
ruby and using the cassandra gem?

Cheers,
Bob

> 
> -ryan
> 
>> Q2: what is the syntax of the (Ruby) query to find the keys of all 'blue' 
>> chunks of data? I'm assuming get_range is the correct method, but what are 
>> the parameters? The docs say: get_range(column_family, options={}) but that 
>> seems to be missing a bit of detail, in particular the super column name.
>> 
>> Q2a: So I know there's a :start and :finish key supported in the options 
>> hash, inclusive, exclusive respectively. How do you define a range for 
>> equals with a UTF8 key? Surely not 'blue'.succ?? or by some kind of suffix??
>> 
>> Q2b: How do you specify the super column name 'colour'? Looking at the 
>> (Ruby) source of the get_range method and I'm unconvinced that this is 
>> implemented (seems to be a constant '' used where the super column name 
>> makes sense to be.)
>> 
>> Anyway I ended up hacking at the Ruby gem's source to use the column name 
>> where the '' was in the original, and didn't really get anywhere useful (I 
>> can find nothing, or everything, nothing in between).
>> 
>> Q3: If I am correct about what is supposed to be done, does the Ruby gem 
>> support it?
>> 
>> Q4: Does anyone know of some Ruby code that does and indexed lookup that 
>> they could point me at. (lots of code that indexes but nothing that searches 
>> by the index)
>> 
>> I'll try to take a look at some of the other Cassandra client 
>> implementations and see if I can get this model to work. Maybe just a Ruby 
>> problem?? With any luck, it'll be me messing up.
>> 
>> If it'd help I can post the source of what I have, but it'll need some 
>> cleanup. Let me know.
>> 
>> Thanks for taking the time to read this far :-)
>> 
>> Bob
>> 
>> ----
>> Bob Hutchison
>> Recursive Design Inc.
>> http://www.recursive.ca/
>> weblog: http://xampl.com/so
>> 
>> 
>> ----
>> Bob Hutchison
>> Recursive Design Inc.
>> http://www.recursive.ca/
>> weblog: http://xampl.com/so
>> 
>> 
>> 
>> 
>> 

----
Bob Hutchison
Recursive Design Inc.
http://www.recursive.ca/
weblog: http://xampl.com/so




Reply via email to