Hi Eric, Thanks for your reply.
I said that load was not a big deal, because ops center shows this loads as green, not as yellow or red at all. Also, our servers have many processors/threads, so I *think* this load is not problematic. My assumption is that for some reason the DC2 10 nodes are not being able to handle the volume of requests from DC1, as it was 30 nodes. Even so, on my point of view the load of the DC2 nodes should go really high before Cassandra goes down, but its not doing so. Regards, Gabriel Enviado pelo celular / Sent from mobile. > Em 16/11/2014, às 12:25, Eric Stevens <migh...@gmail.com> escreveu: > > > load average on DC1 nodes are around 3-5 and on DC2 around 7-10 > > Anecdotally I can say that loads in the 7-10 range have been dangerously > high. When we had a cluster running in this range, the cluster was falling > behind on important tasks such as compaction, and we really struggled to > successfully bootstrap or repair in that DC (2.1.1 cluster). >> On Sun Nov 16 2014 at 6:49:31 AM Gabriel Menegatti <gabr...@s1mbi0se.com.br> >> wrote: >> Hello, >> >> We are using Cassandra 2.1.2 in a multi dc cluster (30 servers on DC1 and 10 >> on DC2) with a key space replication factor of 1 on DC1 and 2 on DC2. >> >> For some reason when we increase the volume of write requests on DC1 (using >> ONE or LOCAL_ONE), the Cassandra java process on DC2 nodes goes down >> randomly. >> >> At the time DC2 nodes starts to go down, the load average on DC1 nodes are >> around 3-5 and on DC2 around 7-10.. so not big deal. >> >> Taking a look at the Cassandra's system.log, we found some exceptions: >> >> ERROR [SharedPool-Worker-43] 2014-11-15 00:39:48,596 >> JVMStabilityInspector.java:94 - JVM state determined to be unstable. >> Exiting forcefully due to: >> java.lang.OutOfMemoryError: Java heap space >> ERROR [CompactionExecutor:8] 2014-11-15 00:39:48,596 >> CassandraDaemon.java:153 - Exception in thread >> Thread[CompactionExecutor:8,1,main] >> java.lang.OutOfMemoryError: Java heap space >> ERROR [Thrift-Selector_2] 2014-11-15 00:39:48,596 Message.java:238 - Got an >> IOException during write! >> java.io.IOException: Broken pipe >> at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_25] >> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) >> ~[na:1.8.0_25] >> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) >> ~[na:1.8.0_25] >> at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_25] >> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) >> ~[na:1.8.0_25] >> at >> org.apache.thrift.transport.TNonblockingSocket.write(TNonblockingSocket.java:164) >> ~[libthrift-0.9.1.jar:0.9.1] >> at com.thinkaurelius.thrift.util.mem.Buffer.writeTo(Buffer.java:104) >> ~[thrift-server-0.3.7.jar:na] >> at >> com.thinkaurelius.thrift.util.mem.FastMemoryOutputTransport.streamTo(FastMemoryOutputTransport.java:112) >> ~[thrift-server-0.3.7.jar:na] >> at com.thinkaurelius.thrift.Message.write(Message.java:222) >> ~[thrift-server-0.3.7.jar:na] >> at >> com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.handleWrite(TDisruptorServer.java:598) >> [thrift-server-0.3.7.jar:na] >> at >> com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.processKey(TDisruptorServer.java:569) >> [thrift-server-0.3.7.jar:na] >> at >> com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.select(TDisruptorServer.java:423) >> [thrift-server-0.3.7.jar:na] >> at >> com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.run(TDisruptorServer.java:383) >> [thrift-server-0.3.7.jar:na] >> ERROR [Thread-94] 2014-11-15 00:39:48,597 CassandraDaemon.java:153 - >> Exception in thread Thread[Thread-94,5,main] >> java.lang.OutOfMemoryError: Java heap space >> at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107) >> ~[na:1.8.0_25] >> at >> org.apache.cassandra.db.composites.AbstractCType.sliceBytes(AbstractCType.java:369) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:101) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:397) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:381) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:117) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:109) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:106) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:101) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:110) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:272) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:168) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:150) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> at >> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:82) >> ~[apache-cassandra-2.1.2.jar:2.1.2] >> >> >> Memory: >> - DC1 servers have 32 GB of RAM and the HEAP is configured to 8 GB. >> - DC2 servers have 16 GB of RAM and the HEAP is also configured to 8 GB. >> >> Please, any hint? >> >> Thanks in advance. >> >> Gabriel.