Help with schema modelling

Tristan Seligmann Sat, 16 Jul 2011 17:09:18 -0700

I'm trying to model a schema for a logging storage system in
Cassandra: Log messages consist of a timestamp, message, and some
other arbitrary key/value pairs. Querying would primarily be done
based on timestamp ranges; I will probably be doing filtering based on
matches against the key/value pairs as well, but I expect that will be
handled by fetching the messages in the desired time range, then
filtering out the uninteresting ones.


A supercolumn makes it easy enough to store the key/value pairs as
columns, but then I end up with all of my log messages in a single
row, which obviously won't work. On the other hand, if I use the
timestamp as the row key, I need to use OPP to query on ranges, and
I'd prefer not to deal with the balancing issues that would raise. I
suppose I could go halfway; use a prefix of the timestamp (eg. date +
hour, or perhaps date + hour + minute) as the key, and then retrieve
all of the keys in the range I'm interested in when performing a
query.

I feel like I'm missing something, though, so I was hoping for some
advice from more experienced users of Cassandra.
-- 
mithrandi, i Ainil en-Balandor, a faer Ambar

Help with schema modelling

Reply via email to