> load average on DC1 nodes are around 3-5 and on DC2 around 7-10 Anecdotally I can say that loads in the 7-10 range have been dangerously high. When we had a cluster running in this range, the cluster was falling behind on important tasks such as compaction, and we really struggled to successfully bootstrap or repair in that DC (2.1.1 cluster). On Sun Nov 16 2014 at 6:49:31 AM Gabriel Menegatti <gabr...@s1mbi0se.com.br> wrote:
> Hello, > > We are using Cassandra 2.1.2 in a multi dc cluster (30 servers on DC1 and > 10 on DC2) with a key space replication factor of 1 on DC1 and 2 on DC2. > > For some reason when we increase the volume of write requests on DC1 > (using ONE or LOCAL_ONE), the Cassandra java process on DC2 nodes goes down > randomly. > > At the time DC2 nodes starts to go down, the load average on DC1 nodes are > around 3-5 and on DC2 around 7-10.. so not big deal. > > *Taking a look at the Cassandra's system.log, we found some exceptions:* > > ERROR [SharedPool-Worker-43] 2014-11-15 00:39:48,596 > JVMStabilityInspector.java:94 - JVM state determined to be unstable. > Exiting forcefully due to: > java.lang.OutOfMemoryError: Java heap space > ERROR [CompactionExecutor:8] 2014-11-15 00:39:48,596 > CassandraDaemon.java:153 - Exception in thread > Thread[CompactionExecutor:8,1,main] > java.lang.OutOfMemoryError: Java heap space > ERROR [Thrift-Selector_2] 2014-11-15 00:39:48,596 Message.java:238 - Got > an IOException during write! > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > ~[na:1.8.0_25] > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > ~[na:1.8.0_25] > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > ~[na:1.8.0_25] > at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_25] > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) > ~[na:1.8.0_25] > at > org.apache.thrift.transport.TNonblockingSocket.write(TNonblockingSocket.java:164) > ~[libthrift-0.9.1.jar:0.9.1] > at > com.thinkaurelius.thrift.util.mem.Buffer.writeTo(Buffer.java:104) > ~[thrift-server-0.3.7.jar:na] > at > com.thinkaurelius.thrift.util.mem.FastMemoryOutputTransport.streamTo(FastMemoryOutputTransport.java:112) > ~[thrift-server-0.3.7.jar:na] > at com.thinkaurelius.thrift.Message.write(Message.java:222) > ~[thrift-server-0.3.7.jar:na] > at > com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.handleWrite(TDisruptorServer.java:598) > [thrift-server-0.3.7.jar:na] > at > com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.processKey(TDisruptorServer.java:569) > [thrift-server-0.3.7.jar:na] > at > com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.select(TDisruptorServer.java:423) > [thrift-server-0.3.7.jar:na] > at > com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.run(TDisruptorServer.java:383) > [thrift-server-0.3.7.jar:na] > ERROR [Thread-94] 2014-11-15 00:39:48,597 CassandraDaemon.java:153 - > Exception in thread Thread[Thread-94,5,main] > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107) > ~[na:1.8.0_25] > at > org.apache.cassandra.db.composites.AbstractCType.sliceBytes(AbstractCType.java:369) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:101) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:397) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:381) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:117) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:109) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:106) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:101) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:110) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:272) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:168) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:150) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:82) > ~[apache-cassandra-2.1.2.jar:2.1.2] > > > *Memory:* > - DC1 servers have 32 GB of RAM and the HEAP is configured to 8 GB. > - DC2 servers have 16 GB of RAM and the HEAP is also configured to 8 GB. > > Please, any hint? > > Thanks in advance. > > Gabriel. >