S>: An *atomic* CAS is another beast and I see at least two
difficulties:
S>: 1) making it atomic locally: Cassandra's implementation is very
much
multi-threaded. On a given node, while you're
reading-comparing-and-swapping
on some column c, no other thread should be allowed to write c (even
'normal'
write). You would probably need to have specific column families
where CAS
is
allowed and for which all writes would be slower (since some
locking would
be
involved). Even then, making such locking efficient and right is
not easy.
But
in the end, local atomicity is quite probably the easy part.
R: I am curious as to how does Cassandra handle two concurrent
writes to
the same column right now? Is there any locking on the write path to
serialize two writes to the same column? If there is any locking
then CAS
can build on that. If there is no such locking then we could
exclude normal
writes from the synchronization/locking required for CAS. So the
normal
write path remains the same, and we let the client know that atomic
CAS
wouldn't work if normal writes are also happening on the same
column values.
In short a client should not mix normal writes with Atomic CAS for
writing
some column value. This will hopefully make things simpler.
S:>2) making it atomic cluster-wide: data is replicated and an
atomic CAS
would
need to apply on the exact same column version in every node.
Which, with
eventual consistency especially, is pretty hard to accomplish unless
you're
locking the cluster (but that's what Cages/ZK do).
R: For starters it would be great if atomic CAS could work for
consistency
level Quorum and ALL and not be supported for other consistency
levels. Even
for other consistency levels what would stop CAS to work? Why would
one
require cluster wide locking? I might be mistaken here but the
atomic CAS
operation would happen individually at all the replica nodes (either
directly or through hinted writes) and would succeed or fail
depending on
the timestamp/version of the column at the replica. If we do Quorum
reads
and CAS writes then we can also be sure about consistency.
S:>That being said, if you have a neat solution for efficient and
distributed
atomic CAS that doesn't require rewriting 80% of Cassandra, I'm
sure there
will be interest in that.
R: That sounds great. I am definitely going to look into this and
report
back if I have a good solution.
Thanks,
Rishi
________________________________
From: Sylvain Lebresne <sylv...@yakaz.com>
To: dev@cassandra.apache.org
Sent: Tue, June 22, 2010 1:21:51 AM
Subject: Re: Atomic Compare and Swap
On Mon, Jun 21, 2010 at 11:19 PM, Rishi Bhardwaj <khichri...@yahoo.com
>
wrote:
I have read the post on cages and it is definitely very
interesting. But
cages seems to be too coarse grained compared to an Atomic Compare
and
Swap
on Cassandra column value. Cages would makes sense when one wants
to do
multiple atomic row, column updates. Also, I am not so sure about
the
scalability when it comes to using zookeeper for keeping locks on
Cassandra
columns... there would also be performance hit with an added RPC for
every
write. I feel Cages maybe fine for systems when one has few locks
but I
feel
an atomic CAS in Cassandra would help us avoid distributed locking
systems
and zookeeper in many other simpler scenarios. For more complicated
(transaction like) things, using Cages may be fine. Then again
doing a
read
before write for CAS in cassandra will make CAS at least as slow
as a
read,
which I believe will still be better than taking a single column
lock
from
zookeeper.
What do other folks think in this regard? From whatever I have
read, I
believe CAS is feasible in Cassandra without hurting the normal
write
path
performance. Only for CAS writes would we have to pay for the read
before
write penalty. I am going to do feasibility study for this and
would love
any pointers from others about this.
Making a (non atomic) CAS is easy (doing it client side is fine,
and there
has been some discussion about 'callbacks' that may or may not
someday
allow
to do that server-side).
An *atomic* CAS is another beast and I see at least two difficulties:
1) making it atomic locally: Cassandra's implementation is very much
multi-threaded. On a given node, while you're
reading-comparing-and-swapping
on some column c, no other thread should be allowed to write c (even
'normal'
write). You would probably need to have specific column families
where CAS
is
allowed and for which all writes would be slower (since some
locking would
be
involved). Even then, making such locking efficient and right is
not easy.
But
in the end, local atomicity is quite probably the easy part.
2) making it atomic cluster-wide: data is replicated and an atomic
CAS
would
need to apply on the exact same column version in every node.
Which, with
eventual consistency especially, is pretty hard to accomplish
unless you're
locking the cluster (but that's what Cages/ZK do).
That being said, if you have a neat solution for efficient and
distributed
atomic CAS that doesn't require rewriting 80% of Cassandra, I'm
sure there
will be interest in that.
--
Sylvain
Thanks,
Rishi
________________________________
From: Rauan Maemirov <ra...@maemirov.com>
To: dev@cassandra.apache.org
Sent: Mon, June 21, 2010 11:27:02 AM
Subject: Re: Atomic Compare and Swap
Have you read this post?
http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages/
I guess, you will like it.
2010/6/22 Rishi Bhardwaj <khichri...@yahoo.com>
I am definitely interested in taking this work up. I believe the
CAS
functionality would help in a lot of different scenarios and
could help
avoid use of other external services (like zookeeper) to provide
similar
functionality. I am new at Cassandra development and would really
appreciate
pointers from the dev. community about how to approach/start on
this
project. Also how feasible is the approach mentioned below to
implement
the
CAS functionality? It would be great if we could have a
discussion on
the
pros and cons.
Thanks,
Rishi
________________________________
From: Sriram Srinivasan <sri...@malhar.net>
To: dev@cassandra.apache.org
Sent: Sun, June 20, 2010 9:47:37 PM
Subject: Re: Atomic Compare and Swap
I too am interested in a CAS facility.
I like Rishi's proposal. One could simply use a version number as
the
logical timestamp. If we promote CAS to a consistency level, it
would
rate
higher than a quorum. One pays the price for a more complex write
path
to
obtain the requisite guarantee.
On Jun 21, 2010, at 4:03 AM, Rishi Bhardwaj wrote:
Heres another thought I had, if say the user always wrote with
quorum
(or
to all) nodes then can't we implement CAS (compare and swap)
assuming
that
user employs logical timestamp and Cassandra doesn't allow writes
to a
column with same or older timestamp. Here's the scenario I am
thinking
about:
Say we use logical timestamp for a column value and lets assume
the
current timestamp is t. Now say two clients read this column and
generate
concurrent CAS (compare and swap) operations on timestamp t and
for both
the
writes the resulting new timestamp would become (t+1). Now if we
don't
allow
writes to a column with same timestamp then only one of these
writes
would
succeed. Of course another assumption is that if a third CAS
write with
compare on logical timestamp (t - 1) came in, that would be
denied as I
believe Cassandra doesn't allow "older" writes to win over "newer"
writes.
Do you think such a thing can be accomplished?