We use log structured tables to hold logs for analysis.

It's basically append only, and immutable.  Every record has a timestamp
for each record inserted.

Having this in ONE big monolithic table can be problematic.

1.  compactions have to compact old data that might not even be used often.

2.  it might be nice to not have the old data touched on disk so that your
can just use it for map reduce.  Being able to fadvise away old data so
that it's not in cache can be valuable.

3.  the ability to drop large chunks of old data is also useful .  For
example, if you run out of disk space, you can just drop the oldest day's
worth of data without having to use tombstones.

MySQL has a somewhat decent partition engine:

http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

It seems like this come be easily implemented using a custom compaction
strategy.

Essentially, you would take each SSTable and first group them into
partitions.  So if you were using day partitions, you could take all
SSTables for that day, and then use another , nested compaction strategy,
like leveled, on just those SSTables.

The older days would yield one SSTable per day, once all the individual
SSTables are compacted.   For a month, you would need a minimum of 30
SSTables.

You would need to implement some custom ways to prune the older partitions.
  And you'd also need some way to define the partitions.

… but maybe an initial prototype could just read from a configuration file,
or another system table which defines them…

(just thinking out loud)

Kevin

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Reply via email to