Re: Mutation dropped

aaron morton Thu, 21 Feb 2013 09:21:19 -0800

> What does rpc_timeout control? Only the reads/writes? 
Yes. 

> like data stream,
streaming_socket_timeout_in_ms in the yaml


> merkle tree request? 
Either no time out or a number of days, cannot remember which right now. 

> What is the side effect if it's set to a really small number, say 20ms?
You will probably get a lot more requests that fail with a TimedOutException. 

rpc_timeout needs to be longer than the time it takes a node to process the 
message, and the time it takes the coordinator to do it's thing. You can look 
at cfhistograms and proxyhistograms to get a better idea of how long a request 
takes in your system.  
  
Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/02/2013, at 6:56 AM, Wei Zhu <[email protected]> wrote:

> What does rpc_timeout control? Only the reads/writes? How about other 
> inter-node communication, like data stream, merkle tree request?  What is the 
> reasonable value for roc_timeout? The default value of 10 seconds are way too 
> long. What is the side effect if it's set to a really small number, say 20ms?
> 
> Thanks.
> -Wei
> 
> From: aaron morton <[email protected]>
> To: [email protected] 
> Sent: Tuesday, February 19, 2013 7:32 PM
> Subject: Re: Mutation dropped
> 
>> Does the rpc_timeout not control the client timeout ?
> No it is how long a node will wait for a response from other nodes before 
> raising a TimedOutException if less than CL nodes have responded. 
> Set the client side socket timeout using your preferred client. 
> 
>> Is there any param which is configurable to control the replication timeout 
>> between nodes ?
> There is no such thing.
> rpc_timeout is roughly like that, but it's not right to think about it that 
> way. 
> i.e. if a message to a replica times out and CL nodes have already responded 
> then we are happy to call the request complete. 
> 
> Cheers
> 
>  
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 19/02/2013, at 1:48 AM, Kanwar Sangha <[email protected]> wrote:
> 
>> Thanks Aaron.
>>  
>> Does the rpc_timeout not control the client timeout ? Is there any param 
>> which is configurable to control the replication timeout between nodes ? Or 
>> the same param is used to control that since the other node is also like a 
>> client ?
>>  
>>  
>>  
>> From: aaron morton [mailto:[email protected]] 
>> Sent: 17 February 2013 11:26
>> To: [email protected]
>> Subject: Re: Mutation dropped
>>  
>> You are hitting the maximum throughput on the cluster. 
>>  
>> The messages are dropped because the node fails to start processing them 
>> before rpc_timeout. 
>>  
>> However the request is still a success because the client requested CL was 
>> achieved. 
>>  
>> Testing with RF 2 and CL 1 really just tests the disks on one local machine. 
>> Both nodes replicate each row, and writes are sent to each replica, so the 
>> only thing the client is waiting on is the local node to write to it's 
>> commit log. 
>>  
>> Testing with (and running in prod) RF3 and CL QUROUM is a more real world 
>> scenario. 
>>  
>> Cheers
>>  
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>>  
>> @aaronmorton
>> http://www.thelastpickle.com
>>  
>> On 15/02/2013, at 9:42 AM, Kanwar Sangha <[email protected]> wrote:
>> 
>> 
>> Hi – Is there a parameter which can be tuned to prevent the mutations from 
>> being dropped ? Is this logic correct ?
>>  
>> Node A and B with RF=2, CL =1. Load balanced between the two.
>>  
>> --  Address           Load       Tokens  Owns (effective)  Host ID           
>>                     Rack
>> UN  10.x.x.x       746.78 GB  256     100.0%            
>> dbc9e539-f735-4b0b-8067-b97a85522a1a  rack1
>> UN  10.x.x.x       880.77 GB  256     100.0%            
>> 95d59054-be99-455f-90d1-f43981d3d778  rack1
>>  
>> Once we hit a very high TPS (around 50k/sec of inserts), the nodes start 
>> falling behind and we see the mutation dropped messages. But there are no 
>> failures on the client. Does that mean other node is not able to persist the 
>> replicated data ? Is there some timeout associated with replicated data 
>> persistence ?
>>  
>> Thanks,
>> Kanwar
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>> From: Kanwar Sangha [mailto:[email protected]] 
>> Sent: 14 February 2013 09:08
>> To: [email protected]
>> Subject: Mutation dropped
>>  
>> Hi – I am doing a load test using YCSB across 2 nodes in a cluster and 
>> seeing a lot of mutation dropped messages.  I understand that this is due to 
>> the replica not being written to the
>> other node ? RF = 2, CL =1.
>>  
>> From the wiki -
>> For MUTATION messages this means that the mutation was not applied to all 
>> replicas it was sent to. The inconsistency will be repaired by Read Repair 
>> or Anti Entropy Repair
>>  
>> Thanks,
>> Kanwar
>>  
> 
> 
>

Re: Mutation dropped

Reply via email to