embedded response, way down below...
On 2010-04-26, at 12:56 PM, Ryan King wrote: > On Sun, Apr 25, 2010 at 11:14 AM, Bob Hutchison > <hutch-li...@recursive.ca> wrote: >> >> Hi, >> >> I'm new to Cassandra and trying to work out how to do something that I've >> implemented any number of times (e.g. TokyoCabinet, Perst, even the >> filesystem using grep :-) I've managed to get some of this working in >> Cassandra but not all. >> >> So here's the core of the situation. >> >> I have this opaque chunk of data that I want to store in Cassandra and then >> find it again. >> >> I can generate a key when the data is created very easily, and I've stored >> it in a straight forward manner: in a column with a key whose value is the >> data. And I can retrieve it when I know the key. No difficulties here at >> all, works fine. >> >> Now I want to index this data taking what I imagine to be a pretty typical >> approach. >> >> Lets say there's two many-to-one indexes: 'colour', and 'size'. Each colour >> value will have more than one chunk of data, same for size. >> >> What I thought I'd do is make a super column and index the chunk of data >> kind of like: { 'colour' => { 'blue' => 1 }, 'size' => { 'large' => 1}} with >> the key equal to the key of the chunk of data. And Cassandra stores it >> without error like that. So using the Ruby gem, it'd be something along the >> lines of: >> >> cassandra.insert(:Indexes, key-of-the-chunk-of-data, { 'colour' => { 'blue' >> => 1 }, 'size' => { 'large' => 1 } }) >> >> Q1: is this a reasonable approach? It *seems* to be what I've read is >> supposed to be done. The 1 is meaningless. Anyway, it executes without error >> in Ruby. > > No. In order to index your data, you need to invert it. Since you're > working in ruby I'd recommend CassandraObject: > http://github.com/nzKoz/cassandra_object. It has indexing built in. Thanks Ryan. I don't really want to add a lot of layers of abstraction here, since what I'm writing is itself an abstraction. Worse, I can't get cassandra_object to install, some kind of gem issue. Anyway... I dusted off my 20-years-ago experience with python (i.e. with the help of google), downloaded and installed pycassa (and thrift itself) and played around a bit. I find that the following python/pycassa snippet works just fine (or well enough). import pycassa client = pycassa.connect() indexes_scf = pycassa.ColumnFamily(client, 'Play', 'Indexes', super=True) rows = list(indexes_scf.get_range(column_start='blue', column_finish='blue', super_column='colour')) The data was inserted using Ruby, but not read, because, as I said (below now), I don't know how to write the equivalent to the indexes_scf.get_range call in the snippet. So a simpler question, how do you write the equivalent to that in ruby and using the cassandra gem? Cheers, Bob > > -ryan > >> Q2: what is the syntax of the (Ruby) query to find the keys of all 'blue' >> chunks of data? I'm assuming get_range is the correct method, but what are >> the parameters? The docs say: get_range(column_family, options={}) but that >> seems to be missing a bit of detail, in particular the super column name. >> >> Q2a: So I know there's a :start and :finish key supported in the options >> hash, inclusive, exclusive respectively. How do you define a range for >> equals with a UTF8 key? Surely not 'blue'.succ?? or by some kind of suffix?? >> >> Q2b: How do you specify the super column name 'colour'? Looking at the >> (Ruby) source of the get_range method and I'm unconvinced that this is >> implemented (seems to be a constant '' used where the super column name >> makes sense to be.) >> >> Anyway I ended up hacking at the Ruby gem's source to use the column name >> where the '' was in the original, and didn't really get anywhere useful (I >> can find nothing, or everything, nothing in between). >> >> Q3: If I am correct about what is supposed to be done, does the Ruby gem >> support it? >> >> Q4: Does anyone know of some Ruby code that does and indexed lookup that >> they could point me at. (lots of code that indexes but nothing that searches >> by the index) >> >> I'll try to take a look at some of the other Cassandra client >> implementations and see if I can get this model to work. Maybe just a Ruby >> problem?? With any luck, it'll be me messing up. >> >> If it'd help I can post the source of what I have, but it'll need some >> cleanup. Let me know. >> >> Thanks for taking the time to read this far :-) >> >> Bob >> >> ---- >> Bob Hutchison >> Recursive Design Inc. >> http://www.recursive.ca/ >> weblog: http://xampl.com/so >> >> >> ---- >> Bob Hutchison >> Recursive Design Inc. >> http://www.recursive.ca/ >> weblog: http://xampl.com/so >> >> >> >> >> ---- Bob Hutchison Recursive Design Inc. http://www.recursive.ca/ weblog: http://xampl.com/so