Re: TEST Cluster corrupt after removenode. how to restore

Leena Ghatpande Wed, 20 May 2020 13:40:50 -0700

Ok. that could be a possiblity, as this table has several static columns.
We have seen corrupt SStable errors before related to static columns, when we 
dropped and recreated the column in this table.


We have an upgrade to 3.11 planned for later this year. so hoping these issues 
will be resolved.

Thank you all for your responses.

For now, offline sstablescrub on corrupt table has helped us bring back the 
cluster to a stable state.

________________________________
From: Erick Ramirez <[email protected]>
Sent: Wednesday, May 20, 2020 3:28 AM
To: [email protected] <[email protected]>
Subject: Re: TEST Cluster corrupt after removenode. how to restore

I've seen this stacktrace before:

WARN  [SharedPool-Worker-1] 2020-05-18 10:22:29,152 
AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
Thread[SharedPool-Worker-1,5,main]: {}
org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
/opt/app/dir1/dir2/data/keypace/table1-f21d1180f5c211e58c9c31653d0c0f4e/mb-2334-big-Data.db
        ...
Caused by: java.io.IOException: Corrupt flags value for clustering prefix 
(isStatic flag set): 128
        at 
org.apache.cassandra.db.ClusteringPrefix$Deserializer.prepare(ClusteringPrefix.java:453)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.db.UnfilteredDeserializer$CurrentDeserializer.prepareNext(UnfilteredDeserializer.java:172)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.db.UnfilteredDeserializer$CurrentDeserializer.hasNext(UnfilteredDeserializer.java:153)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.db.columniterator.SSTableIterator$ForwardReader.computeNext(SSTableIterator.java:124)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.db.columniterator.SSTableIterator$ForwardReader.hasNextInternal(SSTableIterator.java:151)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.db.columniterator.AbstractSSTableIterator$Reader.hasNext(AbstractSSTableIterator.java:366)
 ~[apache-cassandra-3.7.jar:3.7]
        ... 29 common frames omitted

I'm happy to be corrected but I don't think it's because the SSTables are 
corrupted. I think it's because the columns are not returned in alphabetical 
order in early versions of 3.x. When iterating over the columns of the table, 
C* was expecting a static column but because it got another column type 
(columns not ordered in the way it expected) it throws the IOException after 
checking that the isStatic flag is not set and the SSTable is "assumed" to be 
corrupted (because it couldn't read it).

If this is what's happening then I think you're hitting 
CASSANDRA-14638<https://jira.apache.org/jira/browse/CASSANDRA-14638>. It was 
fixed in C* 3.0.18, 3.11.4. If you still have the snapshots of the SSTables 
from before you scrubbed them, you can load them on a cluster with the latest 
C* 3.11.6 and test that you can read them. If so, you wouldn't have lost the 
data and you'd be able to recover them from the snapshots. But it does mean 
that you will need to do a binary upgrade of your cluster to C* 3.11.6. Cheers!

Re: TEST Cluster corrupt after removenode. how to restore

Reply via email to