CL.ONE reads and SimpleSnitch unnecessary timeouts

Erik Onnen Wed, 13 Apr 2011 10:32:52 -0700

Sorry for the complex setup, took a while to identify the behavior and
I'm still not sure I'm reading the code correctly.


Scenario:

Six node ring w/ SimpleSnitch and RF3. For the sake of discussion
assume the token space looks like:

node-0 1-10
node-1 11-20
node-2 21-30
node-3 31-40
node-4 41-50
node-5 51-60

In this scenario we want key 35 where nodes 3,4 and 5 are natural
endpoints. Client is connected to node-0, node-1 or node-2. node-3
goes into a full GC lasting 12 seconds.

What I think we're seeing is that as long as we read with CL.ONE *and*
are connected to 0,1 or 2, we'll never get a response for the
requested key until the failure detector kicks in and convicts 3
resulting in reads spilling over to the other endpoints.

We've tested this by switching to CL.QUORUM and since haven't seen
read timeouts during big GCs.

Assuming the above, is this behavior really correct? We have copies of
the data on two other nodes but because this snitch config always
picks node-3, we always timeout until conviction which can take up to
8 seconds sometimes. Shouldn't the read attempt to pick a different
endpoint in the case of the first timeout rather than repeatedly
trying a node that isn't responding?

Thanks,
-erik

CL.ONE reads and SimpleSnitch unnecessary timeouts

Reply via email to