On Sat, Jul 16, 2011 at 7:08 PM, Tristan Seligmann <[email protected]> wrote: > I'm trying to model a schema for a logging storage system in > Cassandra: Log messages consist of a timestamp, message, and some > other arbitrary key/value pairs. Querying would primarily be done > based on timestamp ranges; I will probably be doing filtering based on > matches against the key/value pairs as well, but I expect that will be > handled by fetching the messages in the desired time range, then > filtering out the uninteresting ones.
I recommend reading this: http://blog.insidesystems.net/basic-time-series-with-cassandra > A supercolumn makes it easy enough to store the key/value pairs as > columns, but then I end up with all of my log messages in a single > row, which obviously won't work. On the other hand, if I use the > timestamp as the row key, I need to use OPP to query on ranges, and > I'd prefer not to deal with the balancing issues that would raise. I > suppose I could go halfway; use a prefix of the timestamp (eg. date + > hour, or perhaps date + hour + minute) as the key, and then retrieve > all of the keys in the range I'm interested in when performing a > query. Do the latter and avoid OPP. Chunking by hour should be sufficient in most cases. -Brandon
