Hi folks,

I'm working on a project that might benefit from Cassandra's
compare-and-swap operations, and we're wondering about whether there are
plausible corner cases under which linearizable consistency semantics
aren't maintained by the implementation.  In particular, we're
interested in how Cassandra's Paxos implementation copes with membership
changes.  A couple of examples:

* The membership of the cluster changes while a Paxos transaction is in
progress in a way that impacts the replica set for the partition of
interest.  How does Cassandra account for this?  Is it possible that a
quorum in the original set might accept a change, whereupon the
membership of the set changes in such a way that a quorum has no longer
accepted the change?  If so, what happens then?

* Perhaps a variant of the above: a lightweight transaction on some
partition is accepted by a quorum but fails on at least one node at the
commit phase (if, e.g., the Cassandra process dies on them before the
mutation is applied).  The set of replicas for that partition then
changes due to some update in the membership of the cluster before the
Paxos state for that partition is inspected again.  In particular, let's
say that one member of the quorum that accepted (but did not necessarily
commit) the mutation is now no longer a replica for the partition.  In
linearizable consistency still maintained?  Is it possible for two
serial-consistency reads to inspect the Paxos state of two different
quorums in that replica set -- the first of which includes only nodes
that didn't accept the mutation and the second of which includes at
least one that did -- and thereby supply two different answers?  Or does
Cassandra take care to ship affected Paxos state to new nodes when they
bootstrap?  If so, how does that work?

If these scenarios (or others like them) do present challenges for
Cassandra's CAS implementation, are there any best practices for
managing them -- for example, only using CAS in clusters with
(comparatively) stable membership?

Thanks,
SK

Reply via email to