Are those timeouts happening right when the node goes down? If so it might be https://issues.apache.org/jira/browse/CASSANDRA-4705 I don't think that issue applies if the node has been down long enough to be marked as down though.
On Wed, Mar 20, 2013 at 12:53 PM, Dwight Smith <dwight.sm...@genesyslab.com>wrote: > Further information, in AZ1, when 143, 145, and 146 are up, all goes > well. But when, say 143, fails, the client receives a TIMEOUT failure – > even though 145 and 146 are up. **** > > ** ** > > *From:* Derek Williams [mailto:de...@fyrie.net] > *Sent:* Wednesday, March 20, 2013 11:50 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Question regarding multi datacenter and LOCAL_QUORUM**** > > ** ** > > I'm think I need help with pointing out what the problem is. The log you > posted only contains references to 143, 145, and 146, which all appear to > be in the same datacenter as 146?**** > > ** ** > > On Wed, Mar 20, 2013 at 11:29 AM, Dwight Smith < > dwight.sm...@genesyslab.com> wrote:**** > > Hi **** > > **** > > I have 2 data centers – with 3 nodes in each DC – version 1.1.6 - > replication factor 2 - topology properties:**** > > **** > > # Cassandra Node IP=Data Center:Rack**** > > xx.yy.zz.143=AZ1:RAC1**** > > xx.yy.zz.145=AZ1:RAC1**** > > xx.yy.zz.146=AZ1:RAC1**** > > xx.yy.zz.147=AZ2:RAC2**** > > xx.yy.zz.148=AZ2:RAC2**** > > xx.yy.zz.149=AZ2:RAC2**** > > **** > > Using LOCAL_QUORUM, my understanding was that reads/writes would process > locally ( for the coordinator ) and send requests to the remaining nodes in > the DC, but in the system log for 146 I observe that this is not the case, > extract from the log:**** > > **** > > DEBUG [Thrift:1] 2013-03-19 00:00:53,312 CassandraServer.java (line 306) > get_slice**** > > DEBUG [Thrift:1] 2013-03-19 00:00:53,313 ReadCallback.java (line 79) Blockfor > is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145**** > > DEBUG [Thrift:1] 2013-03-19 00:00:53,334 CassandraServer.java (line 306) > get_slice**** > > DEBUG [Thrift:1] 2013-03-19 00:00:53,334 ReadCallback.java (line 79) Blockfor > is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143**** > > DEBUG [Thrift:1] 2013-03-19 00:00:53,366 CassandraServer.java (line 306) > get_slice**** > > DEBUG [Thrift:1] 2013-03-19 00:00:53,367 ReadCallback.java (line 79) Blockfor > is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145**** > > DEBUG [Thrift:1] 2013-03-19 00:00:53,391 CassandraServer.java (line 589) > batch_mutate**** > > DEBUG [Thrift:1] 2013-03-19 00:00:53,418 CassandraServer.java (line 589) > batch_mutate**** > > DEBUG [Thrift:1] 2013-03-19 00:00:53,429 CassandraServer.java (line 306) > get_slice**** > > DEBUG [Thrift:1] 2013-03-19 00:00:53,429 ReadCallback.java (line 79) Blockfor > is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.145**** > > DEBUG [Thrift:1] 2013-03-19 00:00:53,441 CassandraServer.java (line 306) > get_slice**** > > DEBUG [Thrift:1] 2013-03-19 00:00:53,441 ReadCallback.java (line 79) Blockfor > is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143**** > > **** > > The batch mutates are as expected – locally, two replicas, and hints to DC > AZ2, but why the unexpected behavior for the get_slice requests. This is > observed throughout the log.**** > > **** > > Thanks much **** > > **** > > **** > > **** > > **** > > > > **** > > ** ** > > -- > Derek Williams **** > -- Derek Williams