I created CASSANDRA-1617 On Oct 12, 2010, at 1:51 PM, Michael Moores wrote:
> I have a cluster of 8 nodes with a replication factor of 3 and consistency of > QUORUM. > When I stop one node in the cluster I end up with socket read timeouts to > other nodes: > org.apache.thrift.transport.TTransportException: > java.net.SocketTimeoutException: Read timed out > The timeout is set to 4 seconds. > > The recent write latency on the other nodes becomes elevated, reaching 5-6 > seconds. > The load on the cluster is minimal. > > I looked at the system log on an adjacent node in the ring and found this > output: > > INFO [WRITE-kv2-app02.dev.real.com/172.27.109.32] 2010-10-12 12:13:36,712 > OutboundTcpConnection.java (line 115) error writing to > kv2-app02.dev.real.com/172.27.109.32 > INFO [WRITE-kv2-app02.dev.real.com/172.27.109.32] 2010-10-12 12:13:50,336 > OutboundTcpConnection.java (line 115) error writing to > kv2-app02.dev.real.com/172.27.109.32 > INFO [Timer-0] 2010-10-12 12:14:22,792 Gossiper.java (line 196) InetAddress > /172.27.109.32 is now dead. > ERROR [MUTATION_STAGE:1315] 2010-10-12 12:14:24,917 > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor > java.nio.BufferUnderflowException > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127) > at java.nio.ByteBuffer.get(ByteBuffer.java:675) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > ERROR [MUTATION_STAGE:1315] 2010-10-12 12:14:24,918 > AbstractCassandraDaemon.java (line 88) Fatal exception in thread > Thread[MUTATION_STAGE:1315,5,main] > java.nio.BufferUnderflowException > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127) > at java.nio.ByteBuffer.get(ByteBuffer.java:675) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > ERROR [MUTATION_STAGE:1605] 2010-10-12 12:14:28,919 > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor > java.nio.BufferUnderflowException > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127) > at java.nio.ByteBuffer.get(ByteBuffer.java:675) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > .... > .... > > I restarted the previously stopped node, and the system recovers, but with a > few more underlflow exceptions: > > INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,537 Gossiper.java (line 594) Node > /172.27.109.32 has restarted, now UP again > INFO [HINTED-HANDOFF-POOL:1] 2010-10-12 12:15:44,537 > HintedHandOffManager.java (line 196) Started hinted handoff for endpoint > /172.27.109.32 > INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,537 StorageService.java (line 643) > Node /172.27.109.32 state jump to normal > INFO [HINTED-HANDOFF-POOL:1] 2010-10-12 12:15:44,538 > HintedHandOffManager.java (line 252) Finished hinted handoff of 0 rows to > endpoint /172.27.109.32 > INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,538 StorageService.java (line 650) > Will not change my token ownership to /172.27.109.32 > ERROR [MUTATION_STAGE:1635] 2010-10-12 12:15:45,083 > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor > java.nio.BufferUnderflowException > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127) > at java.nio.ByteBuffer.get(ByteBuffer.java:675) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > > >