> What does rpc_timeout control? Only the reads/writes? Yes. > like data stream, streaming_socket_timeout_in_ms in the yaml
> merkle tree request? Either no time out or a number of days, cannot remember which right now. > What is the side effect if it's set to a really small number, say 20ms? You will probably get a lot more requests that fail with a TimedOutException. rpc_timeout needs to be longer than the time it takes a node to process the message, and the time it takes the coordinator to do it's thing. You can look at cfhistograms and proxyhistograms to get a better idea of how long a request takes in your system. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 21/02/2013, at 6:56 AM, Wei Zhu <[email protected]> wrote: > What does rpc_timeout control? Only the reads/writes? How about other > inter-node communication, like data stream, merkle tree request? What is the > reasonable value for roc_timeout? The default value of 10 seconds are way too > long. What is the side effect if it's set to a really small number, say 20ms? > > Thanks. > -Wei > > From: aaron morton <[email protected]> > To: [email protected] > Sent: Tuesday, February 19, 2013 7:32 PM > Subject: Re: Mutation dropped > >> Does the rpc_timeout not control the client timeout ? > No it is how long a node will wait for a response from other nodes before > raising a TimedOutException if less than CL nodes have responded. > Set the client side socket timeout using your preferred client. > >> Is there any param which is configurable to control the replication timeout >> between nodes ? > There is no such thing. > rpc_timeout is roughly like that, but it's not right to think about it that > way. > i.e. if a message to a replica times out and CL nodes have already responded > then we are happy to call the request complete. > > Cheers > > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 19/02/2013, at 1:48 AM, Kanwar Sangha <[email protected]> wrote: > >> Thanks Aaron. >> >> Does the rpc_timeout not control the client timeout ? Is there any param >> which is configurable to control the replication timeout between nodes ? Or >> the same param is used to control that since the other node is also like a >> client ? >> >> >> >> From: aaron morton [mailto:[email protected]] >> Sent: 17 February 2013 11:26 >> To: [email protected] >> Subject: Re: Mutation dropped >> >> You are hitting the maximum throughput on the cluster. >> >> The messages are dropped because the node fails to start processing them >> before rpc_timeout. >> >> However the request is still a success because the client requested CL was >> achieved. >> >> Testing with RF 2 and CL 1 really just tests the disks on one local machine. >> Both nodes replicate each row, and writes are sent to each replica, so the >> only thing the client is waiting on is the local node to write to it's >> commit log. >> >> Testing with (and running in prod) RF3 and CL QUROUM is a more real world >> scenario. >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 15/02/2013, at 9:42 AM, Kanwar Sangha <[email protected]> wrote: >> >> >> Hi – Is there a parameter which can be tuned to prevent the mutations from >> being dropped ? Is this logic correct ? >> >> Node A and B with RF=2, CL =1. Load balanced between the two. >> >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> UN 10.x.x.x 746.78 GB 256 100.0% >> dbc9e539-f735-4b0b-8067-b97a85522a1a rack1 >> UN 10.x.x.x 880.77 GB 256 100.0% >> 95d59054-be99-455f-90d1-f43981d3d778 rack1 >> >> Once we hit a very high TPS (around 50k/sec of inserts), the nodes start >> falling behind and we see the mutation dropped messages. But there are no >> failures on the client. Does that mean other node is not able to persist the >> replicated data ? Is there some timeout associated with replicated data >> persistence ? >> >> Thanks, >> Kanwar >> >> >> >> >> >> >> >> From: Kanwar Sangha [mailto:[email protected]] >> Sent: 14 February 2013 09:08 >> To: [email protected] >> Subject: Mutation dropped >> >> Hi – I am doing a load test using YCSB across 2 nodes in a cluster and >> seeing a lot of mutation dropped messages. I understand that this is due to >> the replica not being written to the >> other node ? RF = 2, CL =1. >> >> From the wiki - >> For MUTATION messages this means that the mutation was not applied to all >> replicas it was sent to. The inconsistency will be repaired by Read Repair >> or Anti Entropy Repair >> >> Thanks, >> Kanwar >> > > >
