I'm trying to model a schema for a logging storage system in Cassandra: Log messages consist of a timestamp, message, and some other arbitrary key/value pairs. Querying would primarily be done based on timestamp ranges; I will probably be doing filtering based on matches against the key/value pairs as well, but I expect that will be handled by fetching the messages in the desired time range, then filtering out the uninteresting ones.
A supercolumn makes it easy enough to store the key/value pairs as columns, but then I end up with all of my log messages in a single row, which obviously won't work. On the other hand, if I use the timestamp as the row key, I need to use OPP to query on ranges, and I'd prefer not to deal with the balancing issues that would raise. I suppose I could go halfway; use a prefix of the timestamp (eg. date + hour, or perhaps date + hour + minute) as the key, and then retrieve all of the keys in the range I'm interested in when performing a query. I feel like I'm missing something, though, so I was hoping for some advice from more experienced users of Cassandra. -- mithrandi, i Ainil en-Balandor, a faer Ambar
