Hello, I'm running v3.0.8 in multi-data center deployment (6 DCs, 6 nodes per DC, maximum latency between some nodes ~200ms). After clean cluster start I run into issue when "nodetool descibecluster" shows that some random nodes from deployment are UNREACHABLE however in "nodetool status" or "nodetool gossipinfo" all nodes are UP and in NORMAL state. Nothing suspicious in log file also...
As result ALTER USER failing with following message: WriteTimeout: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 8 responses." info={'received_responses': 8, 'required_responses': 10, 'consistency': 'QUORUM'} Any hints or guidance how to troubleshoot this issue? I have seen advices to restart affected nodes and also tried it but it helps only till first query to cassandra. Related question: how production ready is 3.0.x version? Should I switch to more stable 2.x even if support ends in Nov 2016? ps. date/time is same on all nodes; synced with NTP