He said below that he’d like to keep the old data, so that might rule out TTLs in any case.
You’ve got a few options that I can think of off the top of my head. The easiest from a management perspective is to use one table per month. WhateverData042014 would be this months. It’s easy enough to back up sstables, you just copy them off somewhere. You could compact the previous month’s table at the beginning of the following month, and copy the stables off for archiving, in s3 or something similar. Depending on where you end up moving the data, it might be more trouble than it’s worth, since you might need to come up with a backup plan, and now you’ll have 2 things to back up instead of just 1. Also restoring the data is more of a pain than just querying it. On Apr 28, 2014, at 12:57 PM, Donald Smith <donald.sm...@audiencescience.com> wrote: > CQL lets you specify a default TTL per column family/table: and > default_time_to_live=86400 . > > From: Redmumba [mailto:redmu...@gmail.com] > Sent: Monday, April 28, 2014 12:51 PM > To: user@cassandra.apache.org > Subject: Re: Cassandra data retention policy > > Have you looked into using a TTL? You can set this per insert > (unfortunately, it can't be set per CF) and values will be tombstoned after > that amount of time. I.e., > > INSERT INTO .... VALUES ... TTL 15552000 > > Keep in mind, after the values have expired, they will essentially become > tombstones--so you will still need to run clean-ups (probably daily) to clear > up space. > > Does this help? > > One caveat is that this is difficult to apply to existing rows--i.e., you > can't bulk-update a bunch of rows with this data. As such, another good > suggestion is to simply have a secondary index on a date field of some kind, > and run a bulk remove (and subsequent clean-up) daily/weekly/whatever. > > > On Mon, Apr 28, 2014 at 11:31 AM, Han Jia <johnideal...@gmail.com> wrote: > Hi guys, > > > We have a processing system that just uses the data for the past six months > in Cassandra. Any suggestions on the best way to manage the old data in order > to save disk space? We want to keep it as backup but it will not be used > unless we need to do recovery. Thanks in advance! > > > -John