On Jun 29, 2010, at 1:03 PM, Sylvain Lebresne wrote: >> Hi all, >> >> I'm having some issues with read consistency level ONE. The Wiki (and other >> sources) say the following: >> >> Will return the record returned by the first node to respond. A consistency >> check is always done in a background thread to fix any consistency issues >> when ConsistencyLevel.ONE is used. This means subsequent calls will have >> correct data even if the initial read gets an older value. (This is called >> read repair.) >> >> However, when looking at the code, it seems that the read is only directed >> towards the first node that is suitable (and alive). This means that a slow >> node will cause slow responses even though my replication factor is > 1. I >> would expect the read to go to all the suitable nodes and as soon as one of >> those nodes responds, the reply is used (just as the documentation says). >> >> Moving to Quorum reads would solve part of this problem, but with one server >> down and 1 slow one, I'm back to square one. > > This would not solve part of the problem. > > When you do a QUORUM read, the value(s) of asked column(s) are not requested > from each replica. Instead, the value is asked to one node and only a digest > of the value is asked to the other nodes. This is done to avoid too much > inter-cluster transfer (and thus save bandwidth, and thus make it more > efficient) as in normal condition you expect all value to be exactly the same > and thus transferring all those data would be wasteful. If ever the value and > the digest doesn't match, then only are the actual value requested. > > Same thing for CL.ONE. The background consistency check only really ask for > digests, which save a lot of internal bandwidth. > > Now, back to the slow node problem. The code already do it's best to ask the > best suited node. First by retrieving the data locally if possible, then using > the EndpointSnitch that you can configure to tell Cassandra what is this best > suited node. > There is the problem of slow node because of temporary problem, either network > problem or because this node is too loaded and cannot keep-up. But Cassandra > choose to optimize for the normal case rather than the error case, which I > believe is the right choice.
Thanks for the explanation. It looks that the dynamic endpoint snitch would be helping me in the 0.7 release. Greetings, Wouter