On Jun 29, 2010, at 1:03 PM, Sylvain Lebresne wrote:

>> Hi all,
>> 
>> I'm having some issues with read consistency level ONE. The Wiki (and other 
>> sources) say the following:
>> 
>> Will return the record returned by the first node to respond. A consistency 
>> check is always done in a background thread to fix any consistency issues 
>> when ConsistencyLevel.ONE is used. This means subsequent calls will have 
>> correct data even if the initial read gets an older value. (This is called 
>> read repair.)
>> 
>> However, when looking at the code, it seems that the read is only directed 
>> towards the first node that is suitable (and alive). This means that a slow 
>> node will cause slow responses even though my replication factor is > 1. I 
>> would expect the read to go to all the suitable nodes and as soon as one of 
>> those nodes responds, the reply is used (just as the documentation says).
>> 
>> Moving to Quorum reads would solve part of this problem, but with one server 
>> down and 1 slow one, I'm back to square one.
> 
> This would not solve part of the problem.
> 
> When you do a QUORUM read, the value(s) of asked column(s) are not requested
> from each replica. Instead, the value is asked to one node and only a digest
> of the value is asked to the other nodes. This is done to avoid too much
> inter-cluster transfer (and thus save bandwidth, and thus make it more
> efficient) as in normal condition you expect all value to be exactly the same
> and thus transferring all those data would be wasteful. If ever the value and
> the digest doesn't match, then only are the actual value requested.
> 
> Same thing for CL.ONE. The background consistency check only really ask for
> digests, which save a lot of internal bandwidth.
> 
> Now, back to the slow node problem. The code already do it's best to ask the
> best suited node. First by retrieving the data locally if possible, then using
> the EndpointSnitch that you can configure to tell Cassandra what is this best
> suited node.
> There is the problem of slow node because of temporary problem, either network
> problem or because this node is too loaded and cannot keep-up. But Cassandra
> choose to optimize for the normal case rather than the error case, which I
> believe is the right choice.

Thanks for the explanation. It looks that the dynamic endpoint snitch would be 
helping me in the 0.7 release.

Greetings,

Wouter

Reply via email to