hi Anthony, While you stated the facts right, I don't see how it relates to the question I ask. Can you elaborate specifically what happens in the case I mentioned above to Dave?
thanks, Ritesh On Wed, Feb 23, 2011 at 1:57 PM, Anthony John <chirayit...@gmail.com> wrote: > Seems to me that the explanations are getting incredibly complicated - > while I submit the real issue is not! > > Salient points here:- > 1. To be guaranteed data consistency - the writes and reads have to be at > Quorum CL or more > 2. Any W/R at lesser CL means that the application has to handle the > inconsistency, or has to be tolerant of it > 3. Writing at "ANY" CL - a special case - means that writes will always go > through (as long as any node is up), even if the destination nodes are not > up. This is done via hinted handoff. But this can result in inconsistent > reads, and yes that is a problem but refer to pt-2 above > 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to > handle that case where a particular node is down and the write needs to be > replicated to it. But this will not cause inconsistent R as the hinted > handoff (in this case) only applies after Quorum is met - so a Quorum R is > not dependent on the down node being up, and having got the hint. > > Hope I state this appropriately! > > HTH, > > -JA > > > On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala < > tijoriwala.rit...@gmail.com> wrote: > >> > Read repair will probably occur at that point (depending on your >> config), which would cause the newest value to propagate to more replicas. >> >> Is the newest value the "quorum" value which means it is the old value >> that will be written back to the nodes having "newer non-quorum" value or >> the newest value is the real new value? :) If later, than this seems kind of >> odd to me and how it will be useful to any application. A bug? >> >> Thanks, >> Ritesh >> >> >> On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell <d...@meebo-inc.com> wrote: >> >>> Ritesh, >>> >>> You have seen the problem. Clients may read the newly written value even >>> though the client performing the write saw it as a failure. When the client >>> reads, it will use the correct number of replicas for the chosen CL, then >>> return the newest value seen at any replica. This "newest value" could be >>> the result of a failed write. >>> >>> Read repair will probably occur at that point (depending on your config), >>> which would cause the newest value to propagate to more replicas. >>> >>> R+W>N guarantees serial order of operations: any read at CL=R that occurs >>> after a write at CL=W will observe the write. I don't think this property is >>> relevant to your current question, though. >>> >>> Cassandra has no mechanism to "roll back" the partial write, other than >>> to simply write again. This may also fail. >>> >>> Best, >>> Dave >>> >>> >>> On Wed, Feb 23, 2011 at 10:12 AM, <tijoriwala.rit...@gmail.com> wrote: >>> >>>> Hi Dave, >>>> Thanks for your input. In the steps you mention, what happens when >>>> client tries to read the value at step 6? Is it possible that the client >>>> may >>>> see the new value? My understanding was if R + W > N, then client will not >>>> see the new value as Quorum nodes will not agree on the new value. If that >>>> is the case, then its alright to return failure to the client. However, if >>>> not, then it is difficult to program as after every failure, you as an >>>> client are not sure if failure is a pseudo failure with some side effects >>>> or >>>> real failure. >>>> >>>> Thanks, >>>> Ritesh >>>> >>>> <quote author='Dave Revell'> >>>> >>>> Ritesh, >>>> >>>> There is no commit protocol. Writes may be persisted on some replicas >>>> even >>>> though the quorum fails. Here's a sequence of events that shows the >>>> "problem:" >>>> >>>> 1. Some replica R fails, but recently, so its failure has not yet been >>>> detected >>>> 2. A client writes with consistency > 1 >>>> 3. The write goes to all replicas, all replicas except R persist the >>>> write >>>> to disk >>>> 4. Replica R never responds >>>> 5. Failure is returned to the client, but the new value is still in the >>>> cluster, on all replicas except R. >>>> >>>> Something very similar could happen for CL QUORUM. >>>> >>>> This is a conscious design decision because a commit protocol would >>>> constitute tight coupling between nodes, which goes against the >>>> Cassandra >>>> philosophy. But unfortunately you do have to write your app with this >>>> case >>>> in mind. >>>> >>>> Best, >>>> Dave >>>> >>>> On Tue, Feb 22, 2011 at 8:22 PM, tijoriwala.ritesh < >>>> tijoriwala.rit...@gmail.com> wrote: >>>> >>>> > >>>> > Hi, >>>> > I wanted to get details on how does cassandra do synchronous writes to >>>> W >>>> > replicas (out of N)? Does it do a 2PC? If not, how does it deal with >>>> > failures of of nodes before it gets to write to W replicas? If the >>>> > orchestrating node cannot write to W nodes successfully, I guess it >>>> will >>>> > fail the write operation but what happens to the completed writes on X >>>> (W >>>> > > >>>> > X) nodes? >>>> > >>>> > Thanks, >>>> > Ritesh >>>> > -- >>>> > View this message in context: >>>> > >>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055152.html >>>> > Sent from the cassandra-u...@incubator.apache.org mailing list >>>> archive at >>>> > Nabble.com. >>>> > >>>> >>>> </quote> >>>> Quoted from: >>>> >>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055408.html >>>> >>> >>> >> >