That is a very large heap size for C* - most installations I’ve seen are running in the 8-12MB heap range. Apparently G1GC is better for larger heaps so that may help. However, you are probably better off digging a bit deeper into what is using all that heap? Massive IN clause lists? Massive multi-partition batches? Massive partitions?
Especially given it hit two nodes simultaneously I would be looking for rogue query as my first point of investigation. Cheers Ben On Tue, 27 Sep 2016 at 17:49 xutom <xutom2...@126.com> wrote: > > Hi, all > I have a C* cluster with 12 nodes. My cassandra version is 2.1.14; Just > now two nodes crashed and client fails to export data with read consistency > QUORUM. The following are logs of failed nodes: > > ERROR [SharedPool-Worker-159] 2016-09-26 20:51:14,124 Message.java:538 - > Unexpected exception during request; channel = [id: 0xce43a388, / > 13.13.13.80:55536 :> /13.13.13.149:9042] > java.lang.AssertionError: null > at > org.apache.cassandra.transport.ServerConnection.applyStateTransition(ServerConnection.java:100) > ~[apache-cassandra-2.1.14.jar:2.1.14] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:442) > [apache-cassandra-2.1.14.jar:2.1.14] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) > [apache-cassandra-2.1.14.jar:2.1.14] > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > [na:1.7.0_65] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > [apache-cassandra-2.1.14.jar:2.1.14] > at > org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.14.jar:2.1.14] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] > ERROR [SharedPool-Worker-116] 2016-09-26 20:51:14,125 > JVMStabilityInspector.java:117 - JVM state determined to be unstable. > Exiting forcefully due to: > java.lang.OutOfMemoryError: Java heap space > ERROR [SharedPool-Worker-121] 2016-09-26 20:51:14,125 > JVMStabilityInspector.java:117 - JVM state determined to be unstable. > Exiting forcefully due to: > java.lang.OutOfMemoryError: Java heap space > ERROR [SharedPool-Worker-157] 2016-09-26 20:51:14,124 Message.java:538 - > Unexpected exception during request; channel = [id: 0xce43a388, / > 13.13.13.80:55536 :> /13.13.13.149:9042] > > My server has total 256G memory so I set the MAX_HEAP_SIZE 60G, the config > in cassandra-env.sh: > MAX_HEAP_SIZE="60G" > HEAP_NEWSIZE="20G" > How to solve such OOM? > > > > > -- ———————— Ben Slater Chief Product Officer Instaclustr: Cassandra + Spark - Managed | Consulting | Support +61 437 929 798