Re: TWCS sstables not dropping even though all data is expired

Jeff Jirsa Fri, 03 May 2019 20:21:32 -0700

Repairs work fine with TWCS, but having a non-expiring row will prevent 
tombstones in newer sstables from being purged


I suspect someone did a manual insert/update without a ttl and that effectively 
blocks all other expiring cells from being purged. 

-- 
Jeff Jirsa


> On May 3, 2019, at 7:57 PM, Nick Hatfield <nick.hatfi...@metricly.com> wrote:
> 
> Hi Mike,
>  
> If you will, share your compaction settings. More than likely, your issue is 
> from 1 of 2 reasons:
> 1. You have read repair chance set to anything other than 0
> 2. You’re running repairs on the TWCS CF
>  
> Or both….
>  
> From: Mike Torra [mailto:mto...@salesforce.com.INVALID] 
> Sent: Friday, May 03, 2019 3:00 PM
> To: user@cassandra.apache.org
> Subject: Re: TWCS sstables not dropping even though all data is expired
>  
> Thx for the help Paul - there are definitely some details here I still don't 
> fully understand, but this helped me resolve the problem and know what to 
> look for in the future :)
>  
> On Fri, May 3, 2019 at 12:44 PM Paul Chandler <p...@redshots.com> wrote:
> Hi Mike,
>  
> For TWCS the sstable can only be deleted when all the data has expired in 
> that sstable, but you had a record without a ttl in it, so that sstable could 
> never be deleted.
>  
> That bit is straight forward, the next bit I remember reading somewhere but 
> can’t find it at the moment to confirm my thinking.
>  
> An sstable can only be deleted if it is the earliest sstable. I think this is 
> due to the fact that deleting later sstables may expose old versions of the 
> data stored in the stuck sstable which had been superseded. For example, if 
> there was a tombstone in a later sstable for the non TTLed record causing the 
> problem in this instance. Then deleting that sstable would cause that deleted 
> data to reappear. (Someone please correct me if I have this wrong) 
>  
> Because sstables in different time buckets are never compacted together, this 
> problem only goes away when you did the major compaction.
>  
> This would happen on all replicas of the data, hence the reason you this 
> problem on 3 nodes.
>  
> Thanks 
>  
> Paul
> www.redshots.com
> 
> 
> On 3 May 2019, at 15:35, Mike Torra <mto...@salesforce.com.INVALID> wrote:
>  
> This does indeed seem to be a problem of overlapping sstables, but I don't 
> understand why the data (and number of sstables) just continues to grow 
> indefinitely. I also don't understand why this problem is only appearing on 
> some nodes. Is it just a coincidence that the one rogue test row without a 
> ttl is at the 'root' sstable causing the problem (ie, from the output of 
> `sstableexpiredblockers`)?
>  
> Running a full compaction via `nodetool compact` reclaims the disk space, but 
> I'd like to figure out why this happened and prevent it. Understanding why 
> this problem would be isolated the way it is (ie only one CF even though I 
> have a few others that share a very similar schema, and only some nodes) 
> seems like it will help me prevent it.
>  
>  
> On Thu, May 2, 2019 at 1:00 PM Paul Chandler <p...@redshots.com> wrote:
> Hi Mike,
>  
> It sounds like that record may have been deleted, if that is the case then it 
> would still be shown in this sstable, but the deleted tombstone record would 
> be in a later sstable. You can use nodetool getsstables to work out which 
> sstables contain the data.
>  
> I recommend reading The Last Pickle post on this: 
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html the sections towards 
> the bottom of this post may well explain why the sstable is not being deleted.
>  
> Thanks 
>  
> Paul
> www.redshots.com
> 
> 
> On 2 May 2019, at 16:08, Mike Torra <mto...@salesforce.com.INVALID> wrote:
>  
> I'm pretty stumped by this, so here is some more detail if it helps.
>  
> Here is what the suspicious partition looks like in the `sstabledump` output 
> (some pii etc redacted):
> ```
> {
>     "partition" : {
>       "key" : [ "some_user_id_value", "user_id", "demo-test" ],
>       "position" : 210
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 1132,
>         "clustering" : [ "2019-01-22 15:27:45.000Z" ],
>         "liveness_info" : { "tstamp" : "2019-01-22T15:31:12.415081Z" },
>         "cells" : [
>           { "some": "data" }
>         ]
>       }
>     ]
>   }
> ```
>  
> And here is what every other partition looks like:
> ```
> {
>     "partition" : {
>       "key" : [ "some_other_user_id", "user_id", "some_site_id" ],
>       "position" : 1133
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 1234,
>         "clustering" : [ "2019-01-22 17:59:35.547Z" ],
>         "liveness_info" : { "tstamp" : "2019-01-22T17:59:35.708Z", "ttl" : 
> 86400, "expires_at" : "2019-01-23T17:59:35Z", "expired" : true },
>         "cells" : [
>           { "name" : "activity_data", "deletion_info" : { "local_delete_time" 
> : "2019-01-22T17:59:35Z" }
>           }
>         ]
>       }
>     ]
>   }
> ```
>  
> As expected, almost all of the data except this one suspicious partition has 
> a ttl and is already expired. But if a partition isn't expired and I see it 
> in the sstable, why wouldn't I see it executing a CQL query against the CF? 
> Why would this sstable be preventing so many other sstable's from getting 
> cleaned up?
>  
> On Tue, Apr 30, 2019 at 12:34 PM Mike Torra <mto...@salesforce.com> wrote:
> Hello -
>  
> I have a 48 node C* cluster spread across 4 AWS regions with RF=3. A few 
> months ago I started noticing disk usage on some nodes increasing 
> consistently. At first I solved the problem by destroying the nodes and 
> rebuilding them, but the problem returns.
>  
> I did some more investigation recently, and this is what I found:
> - I narrowed the problem down to a CF that uses TWCS, by simply looking at 
> disk space usage
> - in each region, 3 nodes have this problem of growing disk space (matches 
> replication factor)
> - on each node, I tracked down the problem to a particular SSTable using 
> `sstableexpiredblockers`
> - in the SSTable, using `sstabledump`, I found a row that does not have a ttl 
> like the other rows, and appears to be from someone else on the team testing 
> something and forgetting to include a ttl
> - all other rows show "expired: true" except this one, hence my suspicion
> - when I query for that particular partition key, I get no results
> - I tried deleting the row anyways, but that didn't seem to change anything
> - I also tried `nodetool scrub`, but that didn't help either
>  
> Would this rogue row without a ttl explain the problem? If so, why? If not, 
> does anyone have any other ideas? Why does the row show in `sstabledump` but 
> not when I query for it?
>  
> I appreciate any help or suggestions!
>  
> - Mike
>  
>

Re: TWCS sstables not dropping even though all data is expired

Reply via email to