Forgot to mention that there were also LEAK DETECTED errors: ERROR [Reference-Reaper] 2020-12-11 03:25:42,172 Ref.java:229 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@451030de) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1272432140:Memory@[7f6237800000..7f623aa00000) was not released before the reference was garbage collected ERROR [Reference-Reaper] 2020-12-11 03:25:42,172 Ref.java:229 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@4fe85bae) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@183159863 :[Memory@[0..f060), Memory@[0..10e6c0)] was not released before the reference was garbage collected ERROR [Reference-Reaper] 2020-12-11 03:25:42,173 Ref.java:229 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@4eb88b74) to class org.apache.cassandra.io.util.MmappedRegions$Tidier@992658185:/data_path/md-1105027-big-Data.db was not released before the reference was garbage collected ERROR [Reference-Reaper] 2020-12-11 03:25:42,176 Ref.java:229 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@3692dae9) to class org.apache.cassandra.io.util.FileHandle$Cleanup@1791308664:/data_path/md-1105027-big-Index.db was not released before the reference was garbage collected
On Fri, Dec 11, 2020 at 6:50 PM Shalom Sagges <shalomsag...@gmail.com> wrote: > Hi All, > > I upgraded Cassandra from v3.11.4 to v3.11.8. > The upgrade went smoothly, however, after a few hours, a node crashed on > OOM and a few hours later, another one crashed. > > Seems like they crashed from excessive GC behaviour (CMS). The logs show > Map failures on CompactionExecutor: > > ERROR *[CompactionExecutor:744] *2020-12-11 03:25:42,169 > JVMStabilityInspector.java:94 - OutOfMemory error letting the JVM handle > the error: > ERROR [CompactionExecutor:744] 2020-12-11 03:25:37,765 > CassandraDaemon.java:235 - Exception in thread > Thread[CompactionExecutor:744,1,main] > org.apache.cassandra.io.FSReadError: java.io.IOException: Map failed > at > org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:157) > at > org.apache.cassandra.io.util.MmappedRegions$State.add(MmappedRegions.java:310) > at > org.apache.cassandra.io.util.MmappedRegions$State.access$400(MmappedRegions.java:246) > at > org.apache.cassandra.io.util.MmappedRegions.updateState(MmappedRegions.java:170) > at > org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:73) > ... > ... > Caused by: java.io.IOException: Map failed > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) > at > org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:153) > ... 23 common frames omitted > Caused by: java.lang.OutOfMemoryError: Map failed > at sun.nio.ch.FileChannelImpl.map0(Native Method) > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937) > ... 24 common frames omitted > > > *[CompactionExecutor:744] did the following before the crash:* > INFO [CompactionExecutor:744] 2020-12-11 03:00:29,985 > NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot > allocate chunk of 1048576 > WARN [CompactionExecutor:744] 2020-12-11 03:10:57,437 > BigTableWriter.java:211 - Writing large partition XXXX (108.963MiB).... > WARN [CompactionExecutor:744] 2020-12-11 03:10:57,437 > BigTableWriter.java:211 - Writing large partition YYYY (151.155MiB) > WARN [CompactionExecutor:744] 2020-12-11 03:11:16,445 > BigTableWriter.java:211 - Writing large partition ZZZZ (253.149MiB) > > > *Some more info:* > The *max_map_count* is set to 1048575, so all is well there. > Hugepages are enabled by default (I know I should disable them), but I > don't think it can cause this behaviour. > This never happened on v3.11.4, only on v3.11.8. > > > I'd really appreciate your help on this one. > Thanks! > > > > > >