Thanks for the answer but it is already set to 0 since I don't do any delete.
Cem On Tue, May 28, 2013 at 9:03 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > You need to change the gc_grace time of the column family. It defaults to > 10 days. By default the tombstones will not go away for 10 days. > > > On Tue, May 28, 2013 at 2:46 PM, cem <cayiro...@gmail.com> wrote: > >> Hi Experts, >> >> We have general problem about cleaning up data from the disk. I need to >> free the disk space after retention period and the customer wants to >> dimension the disk space base on that. >> >> After running multiple performance tests with TTL of 1 day we saw that >> the compaction couldn't keep up with the request rate. Disks were getting >> full after 3 days. There were also a lot of sstables that are older than 1 >> day after 3 days. >> >> Things that we tried: >> >> -Change the compaction strategy to leveled. (helped a bit but not much) >> >> -Use big sstable size (10G) with leveled compaction to have more >> aggressive compaction.(helped a bit but not much) >> >> -Upgrade Cassandra from 1.0 to 1.2 to use TTL histograms (didn't help at >> all since it has key overlapping estimation algorithm that generates %100 >> match. Although we don't have...) >> >> Our column family structure is like this: >> >> Event_data_cf: (we store event data. Event_id is randomly generated and >> each event has attributes like location=london) >> >> row data >> >> event id data blob >> >> timeseries_cf: (key is the attribute that we want to index. It can be >> location=london, we didnt use secondary indexes because the indexes are >> dynamic.) >> >> row data >> >> index key time series of event id (event1_id, event2_id….) >> >> timeseries_inv_cf: (this is used for removing event by event row key. ) >> >> row data >> >> event id set of index keys >> >> Candidate Solution: Implementing time range partitions. >> >> Each partition will have column family set and will be managed by client. >> >> Suppose that you want to have 7 days retention period. Then you can >> configure the partition size as 1 day and have 7 active partitions at any >> time. Then you can drop inactive partitions (older that 7 days). Dropping >> will immediate remove the data from the disk. (With proper Cassandra.yaml >> configuration) >> >> Storing an event: >> >> Find the current partition p1 >> >> store to event_data to Event_data_cf_p1 >> >> store to indexes to timeseries_cff_p1 >> >> store to inverted indexes to timeseries_inv_cf_p1 >> >> >> A time range query with an index: >> >> Find the all partitions belongs to that time range >> >> Do read starting from the first partition until you reach to limit >> >> ..... >> >> Could you please provide your comments and concerns ? >> >> Is there any other option that we can try? >> >> What do you think about the candidate solution? >> >> Does anyone have the same issue? How would you solve it in another way? >> >> >> Thanks in advance! >> >> Cem >> > >