unreachable nodes mystery in describecluster output

Aleksandr Ivanov Wed, 03 Aug 2016 05:18:13 -0700

Hello,

I'm running v3.0.8 in multi-data center deployment (6 DCs, 6 nodes per DC,
maximum latency between some nodes ~200ms).
After clean cluster start I run into issue when "nodetool descibecluster"
shows that some random nodes from deployment are UNREACHABLE however in
"nodetool status" or "nodetool gossipinfo" all nodes are UP and in NORMAL
state.
Nothing suspicious in log file also...


As result ALTER USER failing with following message:
WriteTimeout: code=1100 [Coordinator node timed out waiting for replica
nodes' responses] message="Operation timed out - received only 8
responses." info={'received_responses': 8, 'required_responses': 10,
'consistency': 'QUORUM'}

Any hints or guidance how to troubleshoot this issue?

I have seen advices to restart affected nodes and also tried it but it
helps only till first query to cassandra.
Related question: how production ready is 3.0.x version? Should I switch to
more stable 2.x even if support ends in Nov 2016?

ps. date/time is same on all nodes; synced with NTP

unreachable nodes mystery in describecluster output

Reply via email to