Re: Best way to delete by day?

DuyHai Doan Mon, 30 Jun 2014 01:43:32 -0700

It is recommended to split your big binary data into small chunks of
5Mb/10Mb if you want to store them in C*.  Astyanax framework can help with
this.


If you store binary files outside C*, yes, you'll need to manage the
deletion manually, at the same time you delete the partition for a day of
data in C*


On Mon, Jun 30, 2014 at 10:28 AM, Wim Deblauwe <wim.debla...@gmail.com>
wrote:

> Hi,
>
> Thanks for the answers.
>
> Are you saying that I could store big binary files in Cassandra ? I have
> read somewhere that if the file is more than 10 Mb, it is probably not such
> a good idea? The binary files can be up to 50 or 100 Mb, no more in my case.
>
> So the way I understand it, if I store the binary file outside of
> Cassandra, I need to delete manually and go with strategy 2 since there are
> no notifications.
>
> regards,
>
> Wim
>
>
> 2014-06-30 10:23 GMT+02:00 DuyHai Doan <doanduy...@gmail.com>:
>
> Hello Wim
>>
>> TTL is a good fit for your requirement if you want Cassandra to handle
>> the deletion task for you.
>>
>> Now, clearly there are 2 strategies:
>>
>> 1) Store data on the same partition (physical row) and set TTL to expire
>> data automatically
>> 2) Store data on several partitions, one for each day for example, and
>> manage deletion manually or use TTL again
>>
>> If you have few data, strategy 1 is fine. If your data is huge and/or you
>> need to reclaim disk space quickly (especially with the big binary file),
>> you'll probably better off choosing strategy 2. The only drawback with
>> strategy 2 is when you need querying data that span over several days,
>> you'll have to issue many queries (one for each distinct day) or use the
>> "IN" clause of CQL3 but this has a small performance overhead since.
>>
>> Do not forget to set gc_grace_seconds to 0 to have data removed quickly.
>>
>> About notification, it's not possible right now to be notified on the
>> client side when an expiring column (column with TTL) is physically removed
>> by Cassandra
>>
>>
>>
>>
>> On Mon, Jun 30, 2014 at 9:59 AM, Wim Deblauwe <wim.debla...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am getting started with Cassandra (coming from MySQL). I have made a
>>> table with timeseries data (inspired on
>>> http://planetcassandra.org/blog/post/getting-started-with-time-series-data-modeling/
>>> ).
>>>
>>> The table looks like this:
>>>
>>> CREATE TABLE event_message (
>>> message_id uuid,
>>> message_source_id uuid,
>>> message_time timestamp,
>>> event_type_id varchar,
>>> event_state varchar,
>>> filter_state varchar,
>>> image_id uuid,
>>> device_specific_id bigint,
>>> device_specific_begin_id bigint,
>>> characteristics varchar,
>>> PRIMARY KEY (message_source_id, message_time, message_id)
>>> );
>>>
>>> I have now 2 requirements:
>>> 1) I need to remove rows after a certain (user settable) time (between 5
>>> and 60 days). In MySQL, we used partitions by day to quickly delete a whole
>>> day.
>>> 2) I need to store a big binary file along with each row and this file
>>> should be removed when the row is removed.
>>>
>>> I was looking into the expiring columns (with the TTL), but is this a
>>> good fit for this use case? Is this TTL stored between restarts of
>>> Cassandra?
>>>
>>> Would there be any advantage to use the system called "Partitioning to
>>> limit row size – Time Series Pattern 2" in the URL and then explicitly
>>> doing a delete of a whole day? With this system, if I query by time, do I
>>> need to calculate what days are in the interval and explicitly add this in
>>> my query to find the good partitions?
>>>
>>> How can I get notifications if a row is expired when using TTL so I can
>>> removed the associated file?
>>>
>>> regards,
>>>
>>> Wim
>>>
>>
>>
>

Re: Best way to delete by day?

Reply via email to