Re: How does Cassandra handle failure during synchronous writes

2011-02-24 Thread Jonathan Ellis
This is where things starts getting subtle. If Cassandra's failure detector knows ahead of time that not enough writes are available, that is the only time we truly fail a write, and nothing will be written anywhere. But if a write starts during the window where a node is failed but we don't know

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Narendra Sharma
>>c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data that was written to node1 will be returned. >>In this case - N1 will be identified as a discrepancy and the change will be discarded via read repair [Naren] How will Cassandra know this is a discrepancy? On Wed, Feb 23, 201

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
>In this case - N1 will be identified as a discrepancy and the change will be discarded via read repair Brilliant. This does sound correct :) One more related question - how are read repairs protected against a quorum write that is in-progress? For e.g. say nodes A, B, C and Client C1 intends to

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Anthony John
>Remember the simple rule. Column with highest timestamp is the one that will be considered correct EVENTUALLY. So consider following case: I am sorry, that will return inconsistent results even a Q. Time stamp have nothing to do with this. It is just an application provided artifact and could be

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
Thanks Narendra. This is exactly what I was looking for. So the read will return with old value but at the same time, repair will occur and next reads will return "new value". But the new value was never written successfully in the first place as Quorum was never achieved. Isn't that semantically i

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Narendra Sharma
Remember the simple rule. Column with highest timestamp is the one that will be considered correct EVENTUALLY. So consider following case: Cluster size = 3 (say node1, node2 and node3), RF = 3, Read/Write CL = QUORUM a. QUORUM in this case requires 2 nodes. Write failed with successful write to on

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
Hi Anthony, I am not talking about the case of CL ANY. I am talking about the case where your consistency level is R + W > N and you want to write to W nodes but only succeed in writing to X ( where X < W) nodes and hence fail the write to the client. thanks, Ritesh On Wed, Feb 23, 2011 at 2:48

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Anthony John
Ritesh, At CL ANY - if all endpoints are down - a HH is written. And it is a successful write - not a failed write. Now that does not guarantee a READ of the value just written - but that is a risk that you take when you use the ANY CL! HTH, -JA On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwa

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
hi Anthony, While you stated the facts right, I don't see how it relates to the question I ask. Can you elaborate specifically what happens in the case I mentioned above to Dave? thanks, Ritesh On Wed, Feb 23, 2011 at 1:57 PM, Anthony John wrote: > Seems to me that the explanations are getting

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Anthony John
Seems to me that the explanations are getting incredibly complicated - while I submit the real issue is not! Salient points here:- 1. To be guaranteed data consistency - the writes and reads have to be at Quorum CL or more 2. Any W/R at lesser CL means that the application has to handle the incons

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
> Read repair will probably occur at that point (depending on your config), which would cause the newest value to propagate to more replicas. Is the newest value the "quorum" value which means it is the old value that will be written back to the nodes having "newer non-quorum" value or the newest

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Dave Revell
Ritesh, You have seen the problem. Clients may read the newly written value even though the client performing the write saw it as a failure. When the client reads, it will use the correct number of replicas for the chosen CL, then return the newest value seen at any replica. This "newest value" co

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Aaron Morton
At CL levels high than ANY hinted handoff will be used if enabled. It does not contribute to the number of replicas considered written by the coordinator though. E.g. If you ask for quorum, and this is 3 nodes, and only 2 are up the write will fail without starting. In this case the HH is includ

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Javier Canillas
There is something call Hinted Handoff. Suppose that you WRITE something with ConsistencyLevel.ONE on a cluster defined by 4 nodes. Then, the write is done on the corresponding node and it is returned an OK to the client, even if the ReplicationFactor over the destination Keyspace is set to a highe

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Aaron Morton
In the case described below if less than CL nodes respond in rpc_timeout (from conf yaml) the client will get a timeout error. I think most higher level clients will automatically retry in this case. If there are not enough nodes to start the request you will get an Unavailable exception. Again

Re: How does Cassandra handle failure during synchronous writes

2011-02-22 Thread Dave Revell
Ritesh, There is no commit protocol. Writes may be persisted on some replicas even though the quorum fails. Here's a sequence of events that shows the "problem:" 1. Some replica R fails, but recently, so its failure has not yet been detected 2. A client writes with consistency > 1 3. The write go