Hi, Thanks for the answers.
Are you saying that I could store big binary files in Cassandra ? I have read somewhere that if the file is more than 10 Mb, it is probably not such a good idea? The binary files can be up to 50 or 100 Mb, no more in my case. So the way I understand it, if I store the binary file outside of Cassandra, I need to delete manually and go with strategy 2 since there are no notifications. regards, Wim 2014-06-30 10:23 GMT+02:00 DuyHai Doan <doanduy...@gmail.com>: > Hello Wim > > TTL is a good fit for your requirement if you want Cassandra to handle the > deletion task for you. > > Now, clearly there are 2 strategies: > > 1) Store data on the same partition (physical row) and set TTL to expire > data automatically > 2) Store data on several partitions, one for each day for example, and > manage deletion manually or use TTL again > > If you have few data, strategy 1 is fine. If your data is huge and/or you > need to reclaim disk space quickly (especially with the big binary file), > you'll probably better off choosing strategy 2. The only drawback with > strategy 2 is when you need querying data that span over several days, > you'll have to issue many queries (one for each distinct day) or use the > "IN" clause of CQL3 but this has a small performance overhead since. > > Do not forget to set gc_grace_seconds to 0 to have data removed quickly. > > About notification, it's not possible right now to be notified on the > client side when an expiring column (column with TTL) is physically removed > by Cassandra > > > > > On Mon, Jun 30, 2014 at 9:59 AM, Wim Deblauwe <wim.debla...@gmail.com> > wrote: > >> Hi, >> >> I am getting started with Cassandra (coming from MySQL). I have made a >> table with timeseries data (inspired on >> http://planetcassandra.org/blog/post/getting-started-with-time-series-data-modeling/ >> ). >> >> The table looks like this: >> >> CREATE TABLE event_message ( >> message_id uuid, >> message_source_id uuid, >> message_time timestamp, >> event_type_id varchar, >> event_state varchar, >> filter_state varchar, >> image_id uuid, >> device_specific_id bigint, >> device_specific_begin_id bigint, >> characteristics varchar, >> PRIMARY KEY (message_source_id, message_time, message_id) >> ); >> >> I have now 2 requirements: >> 1) I need to remove rows after a certain (user settable) time (between 5 >> and 60 days). In MySQL, we used partitions by day to quickly delete a whole >> day. >> 2) I need to store a big binary file along with each row and this file >> should be removed when the row is removed. >> >> I was looking into the expiring columns (with the TTL), but is this a >> good fit for this use case? Is this TTL stored between restarts of >> Cassandra? >> >> Would there be any advantage to use the system called "Partitioning to >> limit row size – Time Series Pattern 2" in the URL and then explicitly >> doing a delete of a whole day? With this system, if I query by time, do I >> need to calculate what days are in the interval and explicitly add this in >> my query to find the good partitions? >> >> How can I get notifications if a row is expired when using TTL so I can >> removed the associated file? >> >> regards, >> >> Wim >> > >