Hello, You should never run `nodetool compact` since this will result in a massive SSTable that will almost never get compacted out or take a very long time to get compacted out.
You are correct that there needs to be 4 similar-sized SSTables for them to get compacted. If you want the expired data to be deleted quicker, try lowering the STCS `min_threshold` to 3 or even 2. Good luck! Cheers, Erick On Sat, Sep 26, 2015 at 4:40 AM, Dongfeng Lu <dlu66...@yahoo.com> wrote: > Hi I have a table where I set TTL to only 7 days for all records and we > keep pumping records in every day. In general, I would expect all data > files for that table to have timestamps less than, say 8 or 9 days old, > giving the system some time to work its magic. However, I see some files > more than 9 days old occationally. Last Friday, I saw 4 large files, each > about 10G in size, with timestamps about 5, 4, 3, 2 weeks old. > Interestingly they are all gone this Monday, leaving 1 new file 9 GB in > size. > > The compaction strategy is SizeTieredCompactionStrategy, and I can > understand why the above happened. It seems we have 10G of data every week > and when SizeTieredCompactionStrategy works to create various tiers, it > just happened the file size for the next tier is 10G, and all the data is > packed into this huge file. Then it starts the next cycle. Another week > goes by, and another 10G file is created. This process continues until the > minimum number of files of the same size is reached, which I think is 4 by > default. Then it started to compact this set of 4 10G files. At this time, > all data in these 4 files have expired so we end up with nothing or much > smaller file if there is still some records with TTL left. > > I have many tables like this, and I'd like to reclaim those spaces sooner. > What would be the best way to do it? Should I run "nodetool compact" when I > see two large files that are 2 weeks old? Is there configuration parameters > I can tune to achieve the same effect? I looked through all the CQL > Compaction Subproperties for STCS, but I am not sure how they can help > here. Any suggestion is welcome. > > BTW, I am using Cassandra 2.0.6. >