Hi,

Thanks for the answers.

Are you saying that I could store big binary files in Cassandra ? I have
read somewhere that if the file is more than 10 Mb, it is probably not such
a good idea? The binary files can be up to 50 or 100 Mb, no more in my case.

So the way I understand it, if I store the binary file outside of
Cassandra, I need to delete manually and go with strategy 2 since there are
no notifications.

regards,

Wim


2014-06-30 10:23 GMT+02:00 DuyHai Doan <doanduy...@gmail.com>:

> Hello Wim
>
> TTL is a good fit for your requirement if you want Cassandra to handle the
> deletion task for you.
>
> Now, clearly there are 2 strategies:
>
> 1) Store data on the same partition (physical row) and set TTL to expire
> data automatically
> 2) Store data on several partitions, one for each day for example, and
> manage deletion manually or use TTL again
>
> If you have few data, strategy 1 is fine. If your data is huge and/or you
> need to reclaim disk space quickly (especially with the big binary file),
> you'll probably better off choosing strategy 2. The only drawback with
> strategy 2 is when you need querying data that span over several days,
> you'll have to issue many queries (one for each distinct day) or use the
> "IN" clause of CQL3 but this has a small performance overhead since.
>
> Do not forget to set gc_grace_seconds to 0 to have data removed quickly.
>
> About notification, it's not possible right now to be notified on the
> client side when an expiring column (column with TTL) is physically removed
> by Cassandra
>
>
>
>
> On Mon, Jun 30, 2014 at 9:59 AM, Wim Deblauwe <wim.debla...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am getting started with Cassandra (coming from MySQL). I have made a
>> table with timeseries data (inspired on
>> http://planetcassandra.org/blog/post/getting-started-with-time-series-data-modeling/
>> ).
>>
>> The table looks like this:
>>
>> CREATE TABLE event_message (
>> message_id uuid,
>> message_source_id uuid,
>> message_time timestamp,
>> event_type_id varchar,
>> event_state varchar,
>> filter_state varchar,
>> image_id uuid,
>> device_specific_id bigint,
>> device_specific_begin_id bigint,
>> characteristics varchar,
>> PRIMARY KEY (message_source_id, message_time, message_id)
>> );
>>
>> I have now 2 requirements:
>> 1) I need to remove rows after a certain (user settable) time (between 5
>> and 60 days). In MySQL, we used partitions by day to quickly delete a whole
>> day.
>> 2) I need to store a big binary file along with each row and this file
>> should be removed when the row is removed.
>>
>> I was looking into the expiring columns (with the TTL), but is this a
>> good fit for this use case? Is this TTL stored between restarts of
>> Cassandra?
>>
>> Would there be any advantage to use the system called "Partitioning to
>> limit row size – Time Series Pattern 2" in the URL and then explicitly
>> doing a delete of a whole day? With this system, if I query by time, do I
>> need to calculate what days are in the interval and explicitly add this in
>> my query to find the good partitions?
>>
>> How can I get notifications if a row is expired when using TTL so I can
>> removed the associated file?
>>
>> regards,
>>
>> Wim
>>
>
>

Reply via email to