I'd be interested in what the folks who want CAS implementations think about vector clocks. Can you use them to fulfill your use cases? If not, why not?
I ask because I have found myself wanting CAS in Cassandra too, but I think that's only because I'm pretty familiar with HTTP. I think vector clocks with client merge give you essentially the same functionality, but in a way that fits much more nicely with the rest of the Cassandra architecture. CAS really exacerbates Cassandra's weaknesses. Mike On Tue, Jun 22, 2010 at 4:52 PM, Rishi Bhardwaj <khichri...@yahoo.com>wrote: > > > S>: An *atomic* CAS is another beast and I see at least two difficulties: > > S>: 1) making it atomic locally: Cassandra's implementation is very much > >multi-threaded. On a given node, while you're > reading-comparing-and-swapping > >on some column c, no other thread should be allowed to write c (even > 'normal' > >write). You would probably need to have specific column families where CAS > is > >allowed and for which all writes would be slower (since some locking would > be > >involved). Even then, making such locking efficient and right is not easy. > But > >in the end, local atomicity is quite probably the easy part. > > R: I am curious as to how does Cassandra handle two concurrent writes to > the same column right now? Is there any locking on the write path to > serialize two writes to the same column? If there is any locking then CAS > can build on that. If there is no such locking then we could exclude normal > writes from the synchronization/locking required for CAS. So the normal > write path remains the same, and we let the client know that atomic CAS > wouldn't work if normal writes are also happening on the same column values. > In short a client should not mix normal writes with Atomic CAS for writing > some column value. This will hopefully make things simpler. > > S:>2) making it atomic cluster-wide: data is replicated and an atomic CAS > would > >need to apply on the exact same column version in every node. Which, with > >eventual consistency especially, is pretty hard to accomplish unless > you're > >locking the cluster (but that's what Cages/ZK do). > > R: For starters it would be great if atomic CAS could work for consistency > level Quorum and ALL and not be supported for other consistency levels. Even > for other consistency levels what would stop CAS to work? Why would one > require cluster wide locking? I might be mistaken here but the atomic CAS > operation would happen individually at all the replica nodes (either > directly or through hinted writes) and would succeed or fail depending on > the timestamp/version of the column at the replica. If we do Quorum reads > and CAS writes then we can also be sure about consistency. > > S:>That being said, if you have a neat solution for efficient and > distributed > >atomic CAS that doesn't require rewriting 80% of Cassandra, I'm sure there > >will be interest in that. > > > R: That sounds great. I am definitely going to look into this and report > back if I have a good solution. > > > Thanks, > Rishi > > > > > ________________________________ > From: Sylvain Lebresne <sylv...@yakaz.com> > To: dev@cassandra.apache.org > Sent: Tue, June 22, 2010 1:21:51 AM > Subject: Re: Atomic Compare and Swap > > On Mon, Jun 21, 2010 at 11:19 PM, Rishi Bhardwaj <khichri...@yahoo.com> > wrote: > > I have read the post on cages and it is definitely very interesting. But > > cages seems to be too coarse grained compared to an Atomic Compare and > Swap > > on Cassandra column value. Cages would makes sense when one wants to do > > multiple atomic row, column updates. Also, I am not so sure about the > > scalability when it comes to using zookeeper for keeping locks on > Cassandra > > columns... there would also be performance hit with an added RPC for > every > > write. I feel Cages maybe fine for systems when one has few locks but I > feel > > an atomic CAS in Cassandra would help us avoid distributed locking > systems > > and zookeeper in many other simpler scenarios. For more complicated > > (transaction like) things, using Cages may be fine. Then again doing a > read > > before write for CAS in cassandra will make CAS at least as slow as a > read, > > which I believe will still be better than taking a single column lock > from > > zookeeper. > > > > What do other folks think in this regard? From whatever I have read, I > > believe CAS is feasible in Cassandra without hurting the normal write > path > > performance. Only for CAS writes would we have to pay for the read before > > write penalty. I am going to do feasibility study for this and would love > > any pointers from others about this. > > Making a (non atomic) CAS is easy (doing it client side is fine, and there > has been some discussion about 'callbacks' that may or may not someday > allow > to do that server-side). > > An *atomic* CAS is another beast and I see at least two difficulties: > > 1) making it atomic locally: Cassandra's implementation is very much > multi-threaded. On a given node, while you're > reading-comparing-and-swapping > on some column c, no other thread should be allowed to write c (even > 'normal' > write). You would probably need to have specific column families where CAS > is > allowed and for which all writes would be slower (since some locking would > be > involved). Even then, making such locking efficient and right is not easy. > But > in the end, local atomicity is quite probably the easy part. > > 2) making it atomic cluster-wide: data is replicated and an atomic CAS > would > need to apply on the exact same column version in every node. Which, with > eventual consistency especially, is pretty hard to accomplish unless you're > locking the cluster (but that's what Cages/ZK do). > > That being said, if you have a neat solution for efficient and distributed > atomic CAS that doesn't require rewriting 80% of Cassandra, I'm sure there > will be interest in that. > > -- > Sylvain > > > > > Thanks, > > Rishi > > > > > > > > ________________________________ > > From: Rauan Maemirov <ra...@maemirov.com> > > To: dev@cassandra.apache.org > > Sent: Mon, June 21, 2010 11:27:02 AM > > Subject: Re: Atomic Compare and Swap > > > > Have you read this post? > > > http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages/ > > I guess, you will like it. > > > > 2010/6/22 Rishi Bhardwaj <khichri...@yahoo.com> > > > >> I am definitely interested in taking this work up. I believe the CAS > >> functionality would help in a lot of different scenarios and could help > >> avoid use of other external services (like zookeeper) to provide similar > >> functionality. I am new at Cassandra development and would really > appreciate > >> pointers from the dev. community about how to approach/start on this > >> project. Also how feasible is the approach mentioned below to implement > the > >> CAS functionality? It would be great if we could have a discussion on > the > >> pros and cons. > >> > >> Thanks, > >> Rishi > >> > >> > >> > >> ________________________________ > >> From: Sriram Srinivasan <sri...@malhar.net> > >> To: dev@cassandra.apache.org > >> Sent: Sun, June 20, 2010 9:47:37 PM > >> Subject: Re: Atomic Compare and Swap > >> > >> > >> I too am interested in a CAS facility. > >> > >> I like Rishi's proposal. One could simply use a version number as the > >> logical timestamp. If we promote CAS to a consistency level, it would > rate > >> higher than a quorum. One pays the price for a more complex write path > to > >> obtain the requisite guarantee. > >> > >> > >> On Jun 21, 2010, at 4:03 AM, Rishi Bhardwaj wrote: > >> > >> > > >> > Heres another thought I had, if say the user always wrote with quorum > (or > >> to all) nodes then can't we implement CAS (compare and swap) assuming > that > >> user employs logical timestamp and Cassandra doesn't allow writes to a > >> column with same or older timestamp. Here's the scenario I am thinking > >> about: > >> > Say we use logical timestamp for a column value and lets assume the > >> current timestamp is t. Now say two clients read this column and > generate > >> concurrent CAS (compare and swap) operations on timestamp t and for both > the > >> writes the resulting new timestamp would become (t+1). Now if we don't > allow > >> writes to a column with same timestamp then only one of these writes > would > >> succeed. Of course another assumption is that if a third CAS write with > >> compare on logical timestamp (t - 1) came in, that would be denied as I > >> believe Cassandra doesn't allow "older" writes to win over "newer" > writes. > >> Do you think such a thing can be accomplished? > >> > >> > >> > >> > > > > >