Re: Cassandra timeout on node failure

Robert Coli Thu, 23 Jan 2014 11:21:13 -0800

On Thu, Jan 23, 2014 at 8:52 AM, Ankit Patel <patel7...@hotmail.com> wrote:


>  We are seeing a weird issue with our Cassandra cluster(version 1.0.10).
> We have 6 nodes(DC1:3, DC2:3) in our cluster. So all 6 nodes are replicas
> of each other. All reads and writes are LOCAL_QOURUM.
>

Frankly I'm surprised that 1.0.10 includes LOCAL_QUORUM.

My first advice would be to upgrade, current trunk is 3 major versions
above 1.0.10.


> We see that when one of the node in DC1 fails, we see timeout errors on
> the second node for reads. When we turned on DEBUG level logs, we see the
> following error in the Cassandra logs –
> DEBUG [Thrift:322] 2013-12-20 14:30:20,123 StorageProxy.java (line 676)
> Read timeout: java.util.concurrent.TimeoutException: Operation timed out -
> received only 2 responses from / xxx.xxx.xxx.IP1, xxx.xxx.xxx.IP2, .
> Considering that for LOCAL_QOURUM, we only need 2 nodes out of the 3 in
> the DC, I am surprised we are seeing this issue. The log clearly says it
> has received 2 responses. Interestingly, when we connect to the third node
> after the second node returned timeout error, it works as expected. Has
> anyone else faced this issue?
>

Have you searched the Apache JIRA? If you can replicate on more modern
(1.2.13 / 2.0.4) Cassandra, file a JIRA!

=Rob

Re: Cassandra timeout on node failure

Reply via email to