We are seeing a weird issue with our Cassandra cluster(version 1.0.10). We have 6 nodes(DC1:3, DC2:3) in our cluster. So all 6 nodes are replicas of each other. All reads and writes are LOCAL_QOURUM. We see that when one of the node in DC1 fails, we see timeout errors on the second node for reads. When we turned on DEBUG level logs, we see the following error in the Cassandra logs –
DEBUG [Thrift:322] 2013-12-20 14:30:20,123 StorageProxy.java (line 676) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 2 responses from / xxx.xxx.xxx.IP1, xxx.xxx.xxx.IP2, . Considering that for LOCAL_QOURUM, we only need 2 nodes out of the 3 in the DC, I am surprised we are seeing this issue. The log clearly says it has received 2 responses. Interestingly, when we connect to the third node after the second node returned timeout error, it works as expected. Has anyone else faced this issue?