Same answer as on other thread right now about how to index: http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cassandra/ http://www.slideshare.net/benjaminblack/cassandra-basics-indexing
On Fri, Aug 6, 2010 at 6:18 PM, Mark <static.void....@gmail.com> wrote: > On 8/6/10 4:50 PM, Thomas Heller wrote: >>> >>> Thanks for the suggestion. >>> >>> I've somewhat understand all that, the point where my head begins to >>> explode >>> is when I want to figure out something like >>> >>> Continuing with your example: "Over the last X amount of days give me all >>> the logs for remote_addr:XXX". >>> I'm guessing I would need to create a separate index ColumnFamily??? >>> >>> >> >> Depending on your needs you can either insert them directly or pull >> them out later in some map/reduce fashion. What you want is another >> column Family and a similar structure. >> >> ColumnFamily Standard "LogByRemoteAddrAndDate" CompareWith: TimeUUID >> >> Row: "127.0.0.1:20100806" Column TimeUUID/JSON as usual. If you want >> to "link" to the actual log record (to avoid writing if multiple >> times) just insert the same timeuuid you inserted into the other CF >> and leave the value empty. So you have your "Index", aka list of >> column names, and you can look up the actual values using get_slice >> with column_names. >> >> Confusing at first, but really quite simple once you get used to the >> idea. Just alot more work then letting SQL do it for you. ;) >> >> HTH, >> /thomas >> > > Ok, I think the part I was missing was the concatenation of the key and > partition to do the look ups. Is this the preferred way of accomplishing > needs such as this? Are there alternatives ways? > > How would one then "query" over multiple days? Same question for all days. > Should I use range_slice or multiget_slice? And if its range_slice does that > mean I need OrderPreservingPartitioner? > > >