Hello Wim

TTL is a good fit for your requirement if you want Cassandra to handle the
deletion task for you.

Now, clearly there are 2 strategies:

1) Store data on the same partition (physical row) and set TTL to expire
data automatically
2) Store data on several partitions, one for each day for example, and
manage deletion manually or use TTL again

If you have few data, strategy 1 is fine. If your data is huge and/or you
need to reclaim disk space quickly (especially with the big binary file),
you'll probably better off choosing strategy 2. The only drawback with
strategy 2 is when you need querying data that span over several days,
you'll have to issue many queries (one for each distinct day) or use the
"IN" clause of CQL3 but this has a small performance overhead since.

Do not forget to set gc_grace_seconds to 0 to have data removed quickly.

About notification, it's not possible right now to be notified on the
client side when an expiring column (column with TTL) is physically removed
by Cassandra




On Mon, Jun 30, 2014 at 9:59 AM, Wim Deblauwe <wim.debla...@gmail.com>
wrote:

> Hi,
>
> I am getting started with Cassandra (coming from MySQL). I have made a
> table with timeseries data (inspired on
> http://planetcassandra.org/blog/post/getting-started-with-time-series-data-modeling/
> ).
>
> The table looks like this:
>
> CREATE TABLE event_message (
> message_id uuid,
> message_source_id uuid,
> message_time timestamp,
> event_type_id varchar,
> event_state varchar,
> filter_state varchar,
> image_id uuid,
> device_specific_id bigint,
> device_specific_begin_id bigint,
> characteristics varchar,
> PRIMARY KEY (message_source_id, message_time, message_id)
> );
>
> I have now 2 requirements:
> 1) I need to remove rows after a certain (user settable) time (between 5
> and 60 days). In MySQL, we used partitions by day to quickly delete a whole
> day.
> 2) I need to store a big binary file along with each row and this file
> should be removed when the row is removed.
>
> I was looking into the expiring columns (with the TTL), but is this a good
> fit for this use case? Is this TTL stored between restarts of Cassandra?
>
> Would there be any advantage to use the system called "Partitioning to
> limit row size – Time Series Pattern 2" in the URL and then explicitly
> doing a delete of a whole day? With this system, if I query by time, do I
> need to calculate what days are in the interval and explicitly add this in
> my query to find the good partitions?
>
> How can I get notifications if a row is expired when using TTL so I can
> removed the associated file?
>
> regards,
>
> Wim
>

Reply via email to