Compaction settings: ``` compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '6', 'compaction_window_unit': 'HOURS', 'max_threshold': '32', 'min_threshold': '4'} ``` read_repair_chance is 0, and I don't do any repairs because (normally) everything has a ttl. It does seem like Jeff is right that a manual insert/update without a ttl is what caused this, so I know how to resolve it and prevent it from happening again.
Thx again for all the help guys, I appreciate it! On Fri, May 3, 2019 at 11:21 PM Jeff Jirsa <jji...@gmail.com> wrote: > Repairs work fine with TWCS, but having a non-expiring row will prevent > tombstones in newer sstables from being purged > > I suspect someone did a manual insert/update without a ttl and that > effectively blocks all other expiring cells from being purged. > > -- > Jeff Jirsa > > > On May 3, 2019, at 7:57 PM, Nick Hatfield <nick.hatfi...@metricly.com> > wrote: > > Hi Mike, > > > > If you will, share your compaction settings. More than likely, your issue > is from 1 of 2 reasons: > 1. You have read repair chance set to anything other than 0 > > 2. You’re running repairs on the TWCS CF > > > > Or both…. > > > > *From:* Mike Torra [mailto:mto...@salesforce.com.INVALID > <mto...@salesforce.com.INVALID>] > *Sent:* Friday, May 03, 2019 3:00 PM > *To:* user@cassandra.apache.org > *Subject:* Re: TWCS sstables not dropping even though all data is expired > > > > Thx for the help Paul - there are definitely some details here I still > don't fully understand, but this helped me resolve the problem and know > what to look for in the future :) > > > > On Fri, May 3, 2019 at 12:44 PM Paul Chandler <p...@redshots.com> wrote: > > Hi Mike, > > > > For TWCS the sstable can only be deleted when all the data has expired in > that sstable, but you had a record without a ttl in it, so that sstable > could never be deleted. > > > > That bit is straight forward, the next bit I remember reading somewhere > but can’t find it at the moment to confirm my thinking. > > > > An sstable can only be deleted if it is the earliest sstable. I think this > is due to the fact that deleting later sstables may expose old versions of > the data stored in the stuck sstable which had been superseded. For > example, if there was a tombstone in a later sstable for the non TTLed > record causing the problem in this instance. Then deleting that sstable > would cause that deleted data to reappear. (Someone please correct me if I > have this wrong) > > > > Because sstables in different time buckets are never compacted together, > this problem only goes away when you did the major compaction. > > > > This would happen on all replicas of the data, hence the reason you this > problem on 3 nodes. > > > > Thanks > > > > Paul > > www.redshots.com > > > > On 3 May 2019, at 15:35, Mike Torra <mto...@salesforce.com.INVALID> wrote: > > > > This does indeed seem to be a problem of overlapping sstables, but I don't > understand why the data (and number of sstables) just continues to grow > indefinitely. I also don't understand why this problem is only appearing on > some nodes. Is it just a coincidence that the one rogue test row without a > ttl is at the 'root' sstable causing the problem (ie, from the output of > `sstableexpiredblockers`)? > > > > Running a full compaction via `nodetool compact` reclaims the disk space, > but I'd like to figure out why this happened and prevent it. Understanding > why this problem would be isolated the way it is (ie only one CF even > though I have a few others that share a very similar schema, and only some > nodes) seems like it will help me prevent it. > > > > > > On Thu, May 2, 2019 at 1:00 PM Paul Chandler <p...@redshots.com> wrote: > > Hi Mike, > > > > It sounds like that record may have been deleted, if that is the case then > it would still be shown in this sstable, but the deleted tombstone record > would be in a later sstable. You can use nodetool getsstables to work out > which sstables contain the data. > > > > I recommend reading The Last Pickle post on this: > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html the sections > towards the bottom of this post may well explain why the sstable is not > being deleted. > > > > Thanks > > > > Paul > > www.redshots.com > > > > On 2 May 2019, at 16:08, Mike Torra <mto...@salesforce.com.INVALID> wrote: > > > > I'm pretty stumped by this, so here is some more detail if it helps. > > > > Here is what the suspicious partition looks like in the `sstabledump` > output (some pii etc redacted): > > ``` > > { > > "partition" : { > > "key" : [ "some_user_id_value", "user_id", "demo-test" ], > > "position" : 210 > > }, > > "rows" : [ > > { > > "type" : "row", > > "position" : 1132, > > "clustering" : [ "2019-01-22 15:27:45.000Z" ], > > "liveness_info" : { "tstamp" : "2019-01-22T15:31:12.415081Z" }, > > "cells" : [ > > { "some": "data" } > > ] > > } > > ] > > } > > ``` > > > > And here is what every other partition looks like: > > ``` > > { > > "partition" : { > > "key" : [ "some_other_user_id", "user_id", "some_site_id" ], > > "position" : 1133 > > }, > > "rows" : [ > > { > > "type" : "row", > > "position" : 1234, > > "clustering" : [ "2019-01-22 17:59:35.547Z" ], > > "liveness_info" : { "tstamp" : "2019-01-22T17:59:35.708Z", "ttl" : > 86400, "expires_at" : "2019-01-23T17:59:35Z", "expired" : true }, > > "cells" : [ > > { "name" : "activity_data", "deletion_info" : { > "local_delete_time" : "2019-01-22T17:59:35Z" } > > } > > ] > > } > > ] > > } > > ``` > > > > As expected, almost all of the data except this one suspicious partition > has a ttl and is already expired. But if a partition isn't expired and I > see it in the sstable, why wouldn't I see it executing a CQL query against > the CF? Why would this sstable be preventing so many other sstable's from > getting cleaned up? > > > > On Tue, Apr 30, 2019 at 12:34 PM Mike Torra <mto...@salesforce.com> wrote: > > Hello - > > > > I have a 48 node C* cluster spread across 4 AWS regions with RF=3. A few > months ago I started noticing disk usage on some nodes increasing > consistently. At first I solved the problem by destroying the nodes and > rebuilding them, but the problem returns. > > > > I did some more investigation recently, and this is what I found: > > - I narrowed the problem down to a CF that uses TWCS, by simply looking at > disk space usage > > - in each region, 3 nodes have this problem of growing disk space (matches > replication factor) > > - on each node, I tracked down the problem to a particular SSTable using > `sstableexpiredblockers` > > - in the SSTable, using `sstabledump`, I found a row that does not have a > ttl like the other rows, and appears to be from someone else on the team > testing something and forgetting to include a ttl > > - all other rows show "expired: true" except this one, hence my suspicion > > - when I query for that particular partition key, I get no results > > - I tried deleting the row anyways, but that didn't seem to change anything > > - I also tried `nodetool scrub`, but that didn't help either > > > > Would this rogue row without a ttl explain the problem? If so, why? If > not, does anyone have any other ideas? Why does the row show in > `sstabledump` but not when I query for it? > > > > I appreciate any help or suggestions! > > > > - Mike > > > > > >