Schema inconsistency in mixed-version cluster

Sebastian Marsching Tue, 12 Dec 2023 08:28:44 -0800

Hi,

while upgrading our production cluster from C* 3.11.14 to 4.1.3, we experienced 
the issue that some SELECT queries failed due to supposedly no replica being 
available. The system logs on the C* nodes where full of messages like the 
following one:


ERROR [ReadStage-1] 2023-12-11 13:53:57,278 JVMStabilityInspector.java:68 - 
Exception in thread Thread[ReadStage-1,5,SharedPool]
java.lang.IllegalStateException: [channel_data_id, control_system_type, 
server_id, decimation_levels] is not a subset of [channel_data_id]
        at 
org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:593)
        at 
org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:523)
        at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serializeRowBody(UnfilteredSerializer.java:231)
        at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:205)
        at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:137)
        at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:125)
        at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:140)
        at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:95)
        at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:80)
        at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:308)
        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:201)
        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:186)
        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:182)
        at 
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48)
        at 
org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:337)
        at 
org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:63)
        at 
org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
        at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
        at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
        at 
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
        at 
org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)

This problem only persisted while the cluster had a mix of 3.11.14 and 4.1.3 
nodes. As soon as the last node was updated, the problem disappeared 
immediately, so I suspect that it was somehow caused by the unavoidable schema 
inconsistency during the upgrade.

I just wanted to give everyone who hasn’t upgraded yet a heads up, so that they 
are aware that this problem might exist. Interestingly, it seems like not all 
queries involving the affected table were affected by this problem. As far as I 
am aware, no schema changes have ever been made to the affected table, so I am 
pretty certain that the schema inconsistencies were purely related to the 
upgrade process.

We hadn’t noticed this problem when testing the upgrade on our test cluster 
because there we first did the upgrade and then ran the test workload. So, if 
you are worried you might be affected by this problem as well, you might want 
to run your workload on the test cluster while having mixed versions.

I did not investigate the cause further because simply completing the upgrade 
process seemed like the quickest option to get the cluster fully operational 
again.

Cheers,
Sebastian

smime.p7s
Description: S/MIME cryptographic signature

Schema inconsistency in mixed-version cluster

Reply via email to