Hi All, I upgraded Cassandra from v3.11.4 to v3.11.8. The upgrade went smoothly, however, after a few hours, a node crashed on OOM and a few hours later, another one crashed.
Seems like they crashed from excessive GC behaviour (CMS). The logs show Map failures on CompactionExecutor: ERROR *[CompactionExecutor:744] *2020-12-11 03:25:42,169 JVMStabilityInspector.java:94 - OutOfMemory error letting the JVM handle the error: ERROR [CompactionExecutor:744] 2020-12-11 03:25:37,765 CassandraDaemon.java:235 - Exception in thread Thread[CompactionExecutor:744,1,main] org.apache.cassandra.io.FSReadError: java.io.IOException: Map failed at org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:157) at org.apache.cassandra.io.util.MmappedRegions$State.add(MmappedRegions.java:310) at org.apache.cassandra.io.util.MmappedRegions$State.access$400(MmappedRegions.java:246) at org.apache.cassandra.io.util.MmappedRegions.updateState(MmappedRegions.java:170) at org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:73) ... ... Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) at org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:153) ... 23 common frames omitted Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937) ... 24 common frames omitted *[CompactionExecutor:744] did the following before the crash:* INFO [CompactionExecutor:744] 2020-12-11 03:00:29,985 NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot allocate chunk of 1048576 WARN [CompactionExecutor:744] 2020-12-11 03:10:57,437 BigTableWriter.java:211 - Writing large partition XXXX (108.963MiB).... WARN [CompactionExecutor:744] 2020-12-11 03:10:57,437 BigTableWriter.java:211 - Writing large partition YYYY (151.155MiB) WARN [CompactionExecutor:744] 2020-12-11 03:11:16,445 BigTableWriter.java:211 - Writing large partition ZZZZ (253.149MiB) *Some more info:* The *max_map_count* is set to 1048575, so all is well there. Hugepages are enabled by default (I know I should disable them), but I don't think it can cause this behaviour. This never happened on v3.11.4, only on v3.11.8. I'd really appreciate your help on this one. Thanks!