It is recommended to split your big binary data into small chunks of 5Mb/10Mb if you want to store them in C*. Astyanax framework can help with this.
If you store binary files outside C*, yes, you'll need to manage the deletion manually, at the same time you delete the partition for a day of data in C* On Mon, Jun 30, 2014 at 10:28 AM, Wim Deblauwe <wim.debla...@gmail.com> wrote: > Hi, > > Thanks for the answers. > > Are you saying that I could store big binary files in Cassandra ? I have > read somewhere that if the file is more than 10 Mb, it is probably not such > a good idea? The binary files can be up to 50 or 100 Mb, no more in my case. > > So the way I understand it, if I store the binary file outside of > Cassandra, I need to delete manually and go with strategy 2 since there are > no notifications. > > regards, > > Wim > > > 2014-06-30 10:23 GMT+02:00 DuyHai Doan <doanduy...@gmail.com>: > > Hello Wim >> >> TTL is a good fit for your requirement if you want Cassandra to handle >> the deletion task for you. >> >> Now, clearly there are 2 strategies: >> >> 1) Store data on the same partition (physical row) and set TTL to expire >> data automatically >> 2) Store data on several partitions, one for each day for example, and >> manage deletion manually or use TTL again >> >> If you have few data, strategy 1 is fine. If your data is huge and/or you >> need to reclaim disk space quickly (especially with the big binary file), >> you'll probably better off choosing strategy 2. The only drawback with >> strategy 2 is when you need querying data that span over several days, >> you'll have to issue many queries (one for each distinct day) or use the >> "IN" clause of CQL3 but this has a small performance overhead since. >> >> Do not forget to set gc_grace_seconds to 0 to have data removed quickly. >> >> About notification, it's not possible right now to be notified on the >> client side when an expiring column (column with TTL) is physically removed >> by Cassandra >> >> >> >> >> On Mon, Jun 30, 2014 at 9:59 AM, Wim Deblauwe <wim.debla...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I am getting started with Cassandra (coming from MySQL). I have made a >>> table with timeseries data (inspired on >>> http://planetcassandra.org/blog/post/getting-started-with-time-series-data-modeling/ >>> ). >>> >>> The table looks like this: >>> >>> CREATE TABLE event_message ( >>> message_id uuid, >>> message_source_id uuid, >>> message_time timestamp, >>> event_type_id varchar, >>> event_state varchar, >>> filter_state varchar, >>> image_id uuid, >>> device_specific_id bigint, >>> device_specific_begin_id bigint, >>> characteristics varchar, >>> PRIMARY KEY (message_source_id, message_time, message_id) >>> ); >>> >>> I have now 2 requirements: >>> 1) I need to remove rows after a certain (user settable) time (between 5 >>> and 60 days). In MySQL, we used partitions by day to quickly delete a whole >>> day. >>> 2) I need to store a big binary file along with each row and this file >>> should be removed when the row is removed. >>> >>> I was looking into the expiring columns (with the TTL), but is this a >>> good fit for this use case? Is this TTL stored between restarts of >>> Cassandra? >>> >>> Would there be any advantage to use the system called "Partitioning to >>> limit row size – Time Series Pattern 2" in the URL and then explicitly >>> doing a delete of a whole day? With this system, if I query by time, do I >>> need to calculate what days are in the interval and explicitly add this in >>> my query to find the good partitions? >>> >>> How can I get notifications if a row is expired when using TTL so I can >>> removed the associated file? >>> >>> regards, >>> >>> Wim >>> >> >> >