On Thu, Jan 23, 2014 at 8:52 AM, Ankit Patel <patel7...@hotmail.com> wrote:
> We are seeing a weird issue with our Cassandra cluster(version 1.0.10). > We have 6 nodes(DC1:3, DC2:3) in our cluster. So all 6 nodes are replicas > of each other. All reads and writes are LOCAL_QOURUM. > Frankly I'm surprised that 1.0.10 includes LOCAL_QUORUM. My first advice would be to upgrade, current trunk is 3 major versions above 1.0.10. > We see that when one of the node in DC1 fails, we see timeout errors on > the second node for reads. When we turned on DEBUG level logs, we see the > following error in the Cassandra logs – > DEBUG [Thrift:322] 2013-12-20 14:30:20,123 StorageProxy.java (line 676) > Read timeout: java.util.concurrent.TimeoutException: Operation timed out - > received only 2 responses from / xxx.xxx.xxx.IP1, xxx.xxx.xxx.IP2, . > Considering that for LOCAL_QOURUM, we only need 2 nodes out of the 3 in > the DC, I am surprised we are seeing this issue. The log clearly says it > has received 2 responses. Interestingly, when we connect to the third node > after the second node returned timeout error, it works as expected. Has > anyone else faced this issue? > Have you searched the Apache JIRA? If you can replicate on more modern (1.2.13 / 2.0.4) Cassandra, file a JIRA! =Rob