Re: Mutation dropped

Wei Zhu Wed, 20 Feb 2013 09:57:07 -0800

What does rpc_timeout control? Only the reads/writes? How about other 
inter-node communication, like data stream, merkle tree request?  What is the 
reasonable value for roc_timeout? The default value of 10 seconds are way too 
long. What is the side effect if it's set to a really small number, say 20ms?


Thanks.
-Wei


________________________________
 From: aaron morton <[email protected]>
To: [email protected] 
Sent: Tuesday, February 19, 2013 7:32 PM
Subject: Re: Mutation dropped
 

Does the rpc_timeout not control the client timeout ?No it is how long a node 
will wait for a response from other nodes before raising a TimedOutException if 
less than CL nodes have responded. 
Set the client side socket timeout using your preferred client. 

Is there any param which is configurable to control the replication timeout 
between nodes ?There is no such thing.
rpc_timeout is roughly like that, but it's not right to think about it that 
way. 
i.e. if a message to a replica times out and CL nodes have already responded 
then we are happy to call the request complete. 

Cheers

 

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/02/2013, at 1:48 AM, Kanwar Sangha <[email protected]> wrote:

Thanks Aaron.
> 
>Does the rpc_timeout not control the client timeout ? Is there any param which 
>is configurable to control the replication timeout between nodes ? Or the same 
>param is used to control that since the other node is also like a client ?
> 
> 
> 
>From: aaron morton [mailto:[email protected]] 
>Sent: 17 February 2013 11:26
>To: [email protected]
>Subject: Re: Mutation dropped
> 
>You are hitting the maximum throughput on the cluster. 
> 
>The messages are dropped because the node fails to start processing them 
>before rpc_timeout. 
> 
>However the request is still a success because the client requested CL was 
>achieved. 
> 
>Testing with RF 2 and CL 1 really just tests the disks on one local machine. 
>Both nodes replicate each row, and writes are sent to each replica, so the 
>only thing the client is waiting on is the local node to write to it's commit 
>log. 
> 
>Testing with (and running in prod) RF3 and CL QUROUM is a more real world 
>scenario. 
> 
>Cheers
> 
>-----------------
>Aaron Morton
>Freelance Cassandra Developer
>New Zealand
> 
>@aaronmorton
>http://www.thelastpickle.com
> 
>On 15/02/2013, at 9:42 AM, Kanwar Sangha <[email protected]> wrote:
>
>
>
>Hi – Is there a parameter which can be tuned to prevent the mutations from 
>being dropped ? Is this logic correct ?
> 
>Node A and B with RF=2, CL =1. Load balanced between the two.
> 
>--  Address           Load       Tokens  Owns (effective)  Host ID             
>                  Rack
>UN  10.x.x.x       746.78 GB  256     100.0%            
>dbc9e539-f735-4b0b-8067-b97a85522a1a  rack1
>UN  10.x.x.x       880.77 GB  256     100.0%            
>95d59054-be99-455f-90d1-f43981d3d778  rack1
> 
>Once we hit a very high TPS (around 50k/sec of inserts), the nodes start 
>falling behind and we see the mutation dropped messages. But there are no 
>failures on the client. Does that mean other node is not able to persist the 
>replicated data ? Is there some timeout associated with replicated data 
>persistence ?
> 
>Thanks,
>Kanwar
> 
> 
> 
> 
> 
> 
> 
>From: Kanwar Sangha [mailto:[email protected]] 
>Sent: 14 February 2013 09:08
>To: [email protected]
>Subject: Mutation dropped
> 
>Hi – I am doing a load test using YCSB across 2 nodes in a cluster and seeing 
>a lot of mutation dropped messages.  I understand that this is due to the 
>replica not being written to the
>other node ? RF = 2, CL =1.
> 
>From the wiki -
>For MUTATION messages this means that the mutation was not applied to all 
>replicas it was sent to. The inconsistency will be repaired by Read Repair or 
>Anti Entropy Repair
> 
>Thanks,
>Kanwar
>

Re: Mutation dropped

Reply via email to