Hi everyone, I'm running a Cassandra 1.2.19 cluster of 40 nodes and compactions of a specific column family are sporadically raising an AssertionError like this (full stack trace visible under https://gist.github.com/rborer/46862d6d693c0163aa8fe0e74caa2d9a):
ERROR [CompactionExecutor:9137] 2018-08-27 11:43:05,197 org.apache.cassandra.service.CassandraDaemon - Exception in thread Thread[CompactionExecutor:9137,1,main] java.lang.AssertionError: 2 at org.apache.cassandra.db.compaction.LeveledManifest.replace(LeveledManifest.java:267) The data written in this column family can be seen as wide rows, that is, rows with lots of columns. Each column has a TTL of 7 days though. Whenever this happens, it seems to block compactions of this column family (I see the pending compactions increasing) until I restart the failing node. I have searched on jira and on this mailing-list about this issue without too much luck. I suspect it may be related to https://issues.apache.org/jira/browse/CASSANDRA-6563 although it's hard for to confirm. I know this version is pretty old, does this issue anyway rings a bell to one of you? Here are some more details about my cluster: - it is composed of 40 nodes - it is pretty old and I'm in the process of upgrading it, thus it was running without issues under version 1.0.12 & 1.1.12 - it really affect a single column family only (schema can be seen on https://gist.github.com/rborer/46862d6d693c0163aa8fe0e74caa2d9a#file-schema-txt ) - my cluster is set up with RandomPartitioner (inherited from when it was set up on version 0.7) and a replication factor of 3 - it's running weekly repairs (and this assertion happens mostly during repairs) - what I also noted is that since the cluster was upgraded to 1.2.19 the disk size of this column family keeps increasing (it went from 400G to 1.2T!) Thanks in advance for your help. Best regards, Reynald