Are those timeouts happening right when the node goes down? If so it might
be https://issues.apache.org/jira/browse/CASSANDRA-4705
I don't think that issue applies if the node has been down long enough to
be marked as down though.


On Wed, Mar 20, 2013 at 12:53 PM, Dwight Smith
<dwight.sm...@genesyslab.com>wrote:

>  Further information, in AZ1, when 143, 145, and 146 are up, all goes
> well. But when, say 143, fails, the client receives a TIMEOUT failure –
> even though 145 and 146 are up. ****
>
> ** **
>
> *From:* Derek Williams [mailto:de...@fyrie.net]
> *Sent:* Wednesday, March 20, 2013 11:50 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Question regarding multi datacenter and LOCAL_QUORUM****
>
> ** **
>
> I'm think I need help with pointing out what the problem is. The log you
> posted only contains references to 143, 145, and 146, which all appear to
> be in the same datacenter as 146?****
>
> ** **
>
> On Wed, Mar 20, 2013 at 11:29 AM, Dwight Smith <
> dwight.sm...@genesyslab.com> wrote:****
>
> Hi ****
>
>  ****
>
> I have 2 data centers – with 3 nodes in each DC – version 1.1.6 -
>  replication factor 2 -  topology properties:****
>
>  ****
>
> # Cassandra Node IP=Data Center:Rack****
>
> xx.yy.zz.143=AZ1:RAC1****
>
> xx.yy.zz.145=AZ1:RAC1****
>
> xx.yy.zz.146=AZ1:RAC1****
>
> xx.yy.zz.147=AZ2:RAC2****
>
> xx.yy.zz.148=AZ2:RAC2****
>
> xx.yy.zz.149=AZ2:RAC2****
>
>  ****
>
> Using LOCAL_QUORUM, my understanding was that reads/writes would process
> locally ( for the coordinator ) and send requests to the remaining nodes in
> the DC, but in the system log for 146 I observe that this is not the case,
> extract from the log:****
>
>  ****
>
> DEBUG [Thrift:1] 2013-03-19 00:00:53,312 CassandraServer.java (line 306) 
> get_slice****
>
> DEBUG [Thrift:1] 2013-03-19 00:00:53,313 ReadCallback.java (line 79) Blockfor 
> is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145****
>
> DEBUG [Thrift:1] 2013-03-19 00:00:53,334 CassandraServer.java (line 306) 
> get_slice****
>
> DEBUG [Thrift:1] 2013-03-19 00:00:53,334 ReadCallback.java (line 79) Blockfor 
> is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143****
>
> DEBUG [Thrift:1] 2013-03-19 00:00:53,366 CassandraServer.java (line 306) 
> get_slice****
>
> DEBUG [Thrift:1] 2013-03-19 00:00:53,367 ReadCallback.java (line 79) Blockfor 
> is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145****
>
> DEBUG [Thrift:1] 2013-03-19 00:00:53,391 CassandraServer.java (line 589) 
> batch_mutate****
>
> DEBUG [Thrift:1] 2013-03-19 00:00:53,418 CassandraServer.java (line 589) 
> batch_mutate****
>
> DEBUG [Thrift:1] 2013-03-19 00:00:53,429 CassandraServer.java (line 306) 
> get_slice****
>
> DEBUG [Thrift:1] 2013-03-19 00:00:53,429 ReadCallback.java (line 79) Blockfor 
> is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.145****
>
> DEBUG [Thrift:1] 2013-03-19 00:00:53,441 CassandraServer.java (line 306) 
> get_slice****
>
> DEBUG [Thrift:1] 2013-03-19 00:00:53,441 ReadCallback.java (line 79) Blockfor 
> is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143****
>
>  ****
>
> The batch mutates are as expected – locally, two replicas, and hints to DC 
> AZ2, but why the unexpected behavior for the get_slice requests.  This is 
> observed throughout the log.****
>
>  ****
>
> Thanks much ****
>
>   ****
>
>  ****
>
>  ****
>
>  ****
>
>
>
> ****
>
> ** **
>
> --
> Derek Williams ****
>



-- 
Derek Williams

Reply via email to