It might... If you have the disk space a major compaction would be better, or user defined compactions with the large/old SSTable. Better yet if you're on a recent version you can do a splitting major compaction (all these options are available through *nodetool compact*).
On 11 December 2017 at 07:41, tak...@fujitsu.com <tak...@fujitsu.com> wrote: > Hi Jeff > > > > > > I’m appreciate for your detailed explanation :) > > > > > > Ø Expired data gets purged on compaction as long as it doesn’t overlap > with other live data. The overlap thing can be difficult to reason about, > but it’s meant to ensure correctness in the event that you write a value > with ttl 180, then another value with ttl 1, and you don’t want to remove > the value with ttl1 until you’ve also removed the value with ttl180, since > it would lead to data being resurrected > > > > I understand that TTL setting sometimes does not work as we expect, > especially when we alter the > > value afterword because of the Cassandra’s data consistency > functionalities. My understanding is > > correct? > > > > > > And I think of trying sstablesplit utility to let the Cassandra do minor > compaction because one of > > SSTables, which is oldest and very large so I want to compact it. > > > > Do you think my plan works as expected? > > > > > > > > > > Regards, > > Takashima > > > > *From:* Jeff Jirsa [mailto:jji...@gmail.com] > *Sent:* Monday, December 11, 2017 3:36 PM > > *To:* user@cassandra.apache.org > *Subject:* Re: Tombstoned data seems to remain after compaction > > > > Replies inline > > > > > On Dec 10, 2017, at 9:59 PM, "tak...@fujitsu.com" <tak...@fujitsu.com> > wrote: > > Hi Jeff, > > > > > > Ø Are all of your writes TTL’d in this table? > > Yes. We set TTL to 180 days at first, and then altered it to just 1 day > because we noticed the First TTL > > setting is too long. > > > > > > Ok this is different - Kurt’s answer is true when you issue explicit > deletes. Expiring data is slightly different. > > > > Expired data gets purged on compaction as long as it doesn’t overlap with > other live data. The overlap thing can be difficult to reason about, but > it’s meant to ensure correctness in the event that you write a value with > ttl 180, then another value with ttl 1, and you don’t want to remove the > value with ttl1 until you’ve also removed the value with ttl180, since it > would lead to data being resurrected > > > > This is the primary reason that ttl’d data doesn’t get cleaned up when > people expect > > > > > > > > > > Ø Which compaction strategy are you using? > > We use Size Tiered Compaction Strategy. > > > > > > > > LCS would compact more aggressively and try to minimize overlaps > > > > TWCS is designed for expiring data and tries to group data by time window > for more efficient expiration. > > > > You would likely benefit from changing to either of those - but you’ll > want to try it on a single node first to confirm (should be able to find > videos online about using JMX to change the compaction strategy of a single > node) > > > > Ø Are you asking these questions because you’re running out of space > faster than you expect and you’d like to expire data faster? > > You’re right. We want to know the reason and how to purge those old data > soon if possible. > > And I want to understand why those old records reported by the > sstablemetadata command persist in sstable data file *in advance*. > > https://m.youtube.com/watch?v=PWtekUWCIaw > > > > > > Not to self promote too much, but I’ve given a few talks on running time > series Cassandra clusters. These slides https://www.slideshare. > net/mobile/JeffJirsa1/using-time-window-compaction- > strategy-for-time-series-workloads (in video form here, > https://m.youtube.com/watch?v=PWtekUWCIaw ) may be useful. > > > > > > B.T.W > > I’m sorry but please let me ask the question again. > > Here is the excerpt of sstablemetadata command below. > > > > Does the section “*Estimated tombstone drop times*” mean that the sstable > contains tombstones for those records that should expire > > on the date of the 1st column? And the data might exist in other SSTables? > > > > (excerpt) > > ---- > Estimated tombstone drop times:%n > 1510934467: 2475 * 2017.11.18 > 1510965112: 135 > 1510983500: 225 > 1511003962: 105 > 1511021113: 2280 > 1511037818: 30 > 1511055563: 120 > ---- > > > > > > > > > Regards, > > Takashima > > > > *From:* Jeff Jirsa [mailto:jji...@gmail.com <jji...@gmail.com>] > *Sent:* Monday, December 11, 2017 2:35 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Tombstoned data seems to remain after compaction > > > > Mutations read during boot won’t go into the memtable unless the mutation > is in the commitlog (which usually means fairly recent - they’re a fixed > size) > > Are all of your writes TTL’d in this table? > > Which compaction strategy are you using? > > Are you asking these questions because you’re running out of space faster > than you expect and you’d like to expire data faster? > > > > > > -- > > Jeff Jirsa > > > > > On Dec 10, 2017, at 9:30 PM, "tak...@fujitsu.com" <tak...@fujitsu.com> > wrote: > > Hi Kurt, > > > > > > Thanks for your reply! > > > > “”” > > The tombstone needs to compact with every SSTable that contains data for > the corresponding tombstone. > > “”” > > > > Let me explain my understanding by example: > > > > 1. A record inserted with 180 days TTL (Very long). > > 2. The record is saved to SSTable (A) when the server restarts or > some events like that. > > 3. After 180 days pass, The Cassandra process read SSTable (A) on its > boot process ( or, read access?) and put tombstone for the record on * > *Memory**. > > 4. The tombstone on **Memory** is saved to SSTable (B) the next time > the server is rebooted. > > > > The procedure above splits the sstable for both the record per se and > tombstone. > > > > My understanding is correct? > > > > > > > > Regards, > > Takashima > > > > > > *From:* kurt greaves [mailto:k...@instaclustr.com <k...@instaclustr.com>] > *Sent:* Monday, December 11, 2017 1:46 PM > *To:* User <user@cassandra.apache.org> > *Subject:* Re: Tombstoned data seems to remain after compaction > > > > The tombstone needs to compact with every SSTable that contains data for > the corresponding tombstone. For example the tombstone may be in that > SSTable but some data the tombstone covers may possibly be in another > SSTable. Only once all SSTables that contain relevant data have been > compacted with the SSTable containing the tombstone can the tombstone be > removed. > > > > On 11 December 2017 at 01:08, tak...@fujitsu.com <tak...@fujitsu.com> > wrote: > > Hi All, > > > I'm using the SSTable with Size Tired Compaction Strategy with > 10 days gc grace period as default. > > And sstablemetadata command shows Estimated tombstone drop times > As follows after minor compaction on 9th Dec, 2018. > > (excerpt) > Estimated tombstone drop times:%n > 1510934467: 2475 * 2017.11.18 > 1510965112: 135 > 1510983500: 225 > 1511003962: 105 > 1511021113: 2280 > 1511037818: 30 > 1511055563: 120 > 1511075445: 165 > > > I just think there are records that should be deleted on > 18th Nov, 2018 in the SSTable by the output above. My understanding > is correct? > > If my understanding I correct, could someone tell me why those > expired data remains after compation? > > > > > Regards, > Takashima > > ---------------------------------------------------------------------- > Toshiaki Takashima > Toyama Fujitsu Limited > +810764553131, ext. 7260292 <+81%2076-455-3131>355 > > ---------------------------------------------------------------------- > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > > > >