So we're not currently using a dynamic snitch, only the SimpleSnitch
is at play (lots of history as to why, I won't go into it). If this
would solve our problems I'm fine changing it.

Understood re: client contract. I guess in this case my issue is that
the server we're connected to never tries more than the one failing
server until failure detector has kicked in - it keeps flogging the
bad server so subsequent requests never produce a different result
until conviction.

Regarding clients retrying, in this configuration the situation
doesn't improve and it still times out because our client libraries
don't try another host. They still have a valid connection to a
working host, it's just that given our configuration that one node
keeps proxying to a bad server and never routes around it. It sounds
like switching to the dynamic switch would adjust for the first
timeout on subsequent attempts so maybe that's the most advisable
thing in this case.

On Wed, Apr 13, 2011 at 10:58 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
> First, our contract with the client says "we'll give you the answer or
> a timeout after rpc_timeout." Once we start trying to cheat on that
> the client has no guarantee anymore when it should expect a response
> by. So that feels iffy to me.
>
> Second, retrying to a different node isn't expected to give
> substantially better results than the client issuing a retry itself if
> that's what it wants, since by the time we timeout once then FD and/or
> dynamic snitch should route the request to another node for the retry
> without adding additional complexity to StorageProxy.  (If that's not
> what you see in practice, then we probably have a dynamic snitch bug.)
>
> On Wed, Apr 13, 2011 at 12:32 PM, Erik Onnen <eon...@gmail.com> wrote:
>> Sorry for the complex setup, took a while to identify the behavior and
>> I'm still not sure I'm reading the code correctly.
>>
>> Scenario:
>>
>> Six node ring w/ SimpleSnitch and RF3. For the sake of discussion
>> assume the token space looks like:
>>
>> node-0 1-10
>> node-1 11-20
>> node-2 21-30
>> node-3 31-40
>> node-4 41-50
>> node-5 51-60
>>
>> In this scenario we want key 35 where nodes 3,4 and 5 are natural
>> endpoints. Client is connected to node-0, node-1 or node-2. node-3
>> goes into a full GC lasting 12 seconds.
>>
>> What I think we're seeing is that as long as we read with CL.ONE *and*
>> are connected to 0,1 or 2, we'll never get a response for the
>> requested key until the failure detector kicks in and convicts 3
>> resulting in reads spilling over to the other endpoints.
>>
>> We've tested this by switching to CL.QUORUM and since haven't seen
>> read timeouts during big GCs.
>>
>> Assuming the above, is this behavior really correct? We have copies of
>> the data on two other nodes but because this snitch config always
>> picks node-3, we always timeout until conviction which can take up to
>> 8 seconds sometimes. Shouldn't the read attempt to pick a different
>> endpoint in the case of the first timeout rather than repeatedly
>> trying a node that isn't responding?
>>
>> Thanks,
>> -erik
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Reply via email to