TTL is good for this but I have no idea how you will ever be able to
restore data removed from disk like that.
Perhaps one could make a snapshot and then delete everything with timestamp
older than a date and then run compaction on every node to reclaim the disk.


2014-04-28 21:57 GMT+02:00 Donald Smith <donald.sm...@audiencescience.com>:

>  CQL lets you specify a default TTL per column family/table:  and
> default_time_to_live=86400 .
>
>
>
> *From:* Redmumba [mailto:redmu...@gmail.com]
> *Sent:* Monday, April 28, 2014 12:51 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra data retention policy
>
>
>
> Have you looked into using a TTL?  You can set this per insert
> (unfortunately, it can't be set per CF) and values will be tombstoned after
> that amount of time.  I.e.,
>
>     INSERT INTO .... VALUES ... TTL 15552000
>
> Keep in mind, after the values have expired, they will essentially become
> tombstones--so you will still need to run clean-ups (probably daily) to
> clear up space.
>
> Does this help?
>
> One caveat is that this is difficult to apply to existing rows--i.e., you
> can't bulk-update a bunch of rows with this data.  As such, another good
> suggestion is to simply have a secondary index on a date field of some
> kind, and run a bulk remove (and subsequent clean-up) daily/weekly/whatever.
>
>
>
> On Mon, Apr 28, 2014 at 11:31 AM, Han Jia <johnideal...@gmail.com> wrote:
>
> Hi guys,
>
>
>
>
>
> We have a processing system that just uses the data for the past six
> months in Cassandra. Any suggestions on the best way to manage the old data
> in order to save disk space? We want to keep it as backup but it will not
> be used unless we need to do recovery. Thanks in advance!
>
>
>
>
>
> -John
>
>
>

Reply via email to