If you are correct and you are probably closer to the code - then CL of Quorum does not guarantee a consistency.
On Thu, Feb 24, 2011 at 10:54 AM, Sylvain Lebresne <sylv...@datastax.com>wrote: > On Thu, Feb 24, 2011 at 5:34 PM, Anthony John <chirayit...@gmail.com>wrote: > >> >>Time stamps are not used for conflict resolution - unless is is part >>> of the application logic!!! >>> >> >> >>What is you definition of conflict resolution ? Because if you update >> twice the same column (which >> >>I'll call a conflict), then the timestamps are used to decide which >> update wins (which I'll call a resolution). >> >> I understand what you are saying, and yes semantics is very important >> here. And yes we are responding to the immediate questions without covering >> all questions in the thread. >> >> The point being made here is that the timestamp of the column is not used >> by Cassandra to figure out what data to return. >> > > Not quite true. > > >> E.g. - Quorum is 2 nodes - and RF of 3 over N1/2/3 >> A Quorum Write comes and add/updates the time stamp (TS2) of a particular >> data element. It succeeds on N1 - fails on N2/3. So the write is returned as >> failed - right ? >> Now Quorum read comes in for exactly the same piece of data that the write >> failed for. >> So N1 has TS2 but both N2/3 have the old TS (say TS1) >> And the read succeeds - Will it return TS1 or TS2. >> >> I submit it will return TS1 - the old TS. >> > > It all depends on which (first 2) nodes respond to the read (since RF=3, > that can any two of N1/N2/N3). If N1 is part of the two that makes the > quorum, then TS2 will be returned, because cassandra will compare the > timestamp and decide what to return based on this. If N2/N3 responds > however, both timestamp will be TS1 and so, after timestamp resolution, it > will stil be TS1 that will be returned. > So yes timestamp is used for conflict resolution. > > In your example, you could get TS1 back because a failed write can let you > cluster in an inconsistent state. You'd have to retry the quorum and only > when it succeeds can you be guaranteed that quorum read will always return > TS2. > > This is because when a write fails, Cassandra doesn't guarantee that the > write did not made it in (there is no revert). > > >> >> Are we on the same page with this interpretation ? >> >> Regards, >> >> -JA >> >> On Thu, Feb 24, 2011 at 10:12 AM, Sylvain Lebresne >> <sylv...@datastax.com>wrote: >> >>> On Thu, Feb 24, 2011 at 4:52 PM, Anthony John <chirayit...@gmail.com>wrote: >>> >>>> Sylvan, >>>> >>>> Time stamps are not used for conflict resolution - unless is is part of >>>> the application logic!!! >>>> >>> >>> What is you definition of conflict resolution ? Because if you update >>> twice the same column (which >>> I'll call a conflict), then the timestamps are used to decide which >>> update wins (which I'll call a resolution). >>> >>> >>>> You can have "lost updates" w/Cassandra. You need to to use 3rd products >>>> - cages for e.g. - to get ACID type consistency. >>>> >>> >>> Then again, you'll have to define what you are calling "lost updates". >>> Provided you use a reasonable consistency level, Cassandra provides fairly >>> strong durability guarantee, so for some definition you don't "lose >>> updates". >>> >>> That being said, I never pretended that Cassandra provided any ACID >>> guarantee. ACID relates to transaction, which Cassandra doesn't support. If >>> we're talking about the guarantees of transaction, then by all means, >>> cassandra won't provide it. And yes you can use cages or the like to get >>> transaction. But that was not the point of the thread, was it ? The thread >>> is about vector clocks, and that has nothing to do with transaction (vector >>> clocks certainly don't give you transactions). >>> >>> Sorry if I wasn't clear in my mail, but I was only responding to why so >>> far I don't think vector clocks would really provide much for Cassandra. >>> >>> -- >>> Sylvain >>> >>> >>>> -JA >>>> >>>> >>>> On Thu, Feb 24, 2011 at 7:41 AM, Sylvain Lebresne <sylv...@datastax.com >>>> > wrote: >>>> >>>>> On Thu, Feb 24, 2011 at 3:22 AM, Anthony John >>>>> <chirayit...@gmail.com>wrote: >>>>> >>>>>> Apologies : For some reason my response on the original mail keeps >>>>>> bouncing back, thus this new one! >>>>>> > From the other hand, the same article says: >>>>>> > "For conditional writes to work, the condition must be evaluated at >>>>>> all update >>>>>> > sites before the write can be allowed to succeed." >>>>>> > >>>>>> > This means, that when doing such an update CL=ALL must be used >>>>>> >>>>>> Sorry, but I am confused by that entire thread! >>>>>> >>>>>> Questions:- >>>>>> 1. Does Cassandra implement any kind of data locking - at any >>>>>> granularity whether it be row/colF/Col ? >>>>>> >>>>> >>>>> No locking, no. >>>>> >>>>> >>>>>> 2. If the answer to 1 above is NO! - how does CL ALL prevent >>>>>> conflicts. Concurrent updates on exactly the same piece of data on >>>>>> different >>>>>> nodes can still mess each other up, right ? >>>>>> >>>>> >>>>> Not sure why you are taking CL.ALL specifically. But in any CL, >>>>> updating the same piece of data means the same column value. In that case, >>>>> the resolution rules are the following: >>>>> - If the updates have a different timestamp, keep the one with the >>>>> higher timestamp. That is, the more recent of two updates win. >>>>> - It the timestamps are the same, then it compares the values (byte >>>>> comparison) and keep the highest value. This is just to break ties in a >>>>> consistent manner. >>>>> >>>>> So if you do two truly concurrent updates (that is from two place at >>>>> the same instant), then you'll end with one of the update. This is the >>>>> column level. >>>>> >>>>> However, if that simple conflict detection/resolution mechanism is not >>>>> good enough for some of your use case and you need to keep two concurrent >>>>> updates, it is easy enough. Just make sure that the update don't end up in >>>>> the same column. This is easily achieved by appending some unique >>>>> identifier >>>>> to the column name for instance. And when reading, do a slice and >>>>> reconcile >>>>> whatever you get back with whatever logic make sense. If you do that, >>>>> congrats, you've roughly emulated what vector clocks would do. Btw, no >>>>> locking or anything needed. >>>>> >>>>> In my experience, for most things the timestamp resolution is enough. >>>>> If the same user update twice it's profile picture on you web site at the >>>>> same microsecond, it's usually fine to end up with one of the two >>>>> pictures. >>>>> In the rare case where you need something more specific, using the >>>>> cassandra >>>>> data model usually solves the problem easily. The reason for not having >>>>> vector clocks in Cassandra is that so far, we haven't really found much >>>>> example where it is no the case. >>>>> >>>>> -- >>>>> Sylvain >>>>> >>>>> >>>> >>> >> >