Re: [DISCUSS] CEP 14: Paxos Improvements

[email protected] Fri, 20 Aug 2021 02:04:40 -0700

Hello and welcome!

So this is a really complicated topic, unfortunately, but the simple answer is 
that as currently formulated this work won’t address this particular case. The 
slightly longer answer is that this problem will be a thing of the past soon 
either way - there’s work incoming to address every possible category of this 
kind of problem, but it might take a little longer.

The full answer is that membership of a keyspace in Cassandra is a mess, and is 
derived from the intersection of two things: schema and gossip. The electorate 
verification addresses _gossip_ inconsistencies, that is, inconsistencies about 
what nodes are perceived to be a member of the ring. Schema generates the issue 
you are discussing here. In particular the lack of any state machine that 
transitions from one topology to another when a new schema implies a new 
topology. This is its own distinct problem, that others I work with plan to 
file a CEP for in the coming weeks or months.

In the meantime, the correct way to do this (painful though it might be) is to 
add one node at a time. So instead of adding DC2 at RF=3, add DC2 at RF=1 and 
wait for that to settle, *run repair* and then bump to RF=2, etc.

To respond to Mick: we could introduce an EACH_SERIAL which would permit this 
to be done in one go. This isn’t a super complicated piece of work, and I’d be 
happy to help review a contribution here. However, in my view we should be 
reconsidering how quorums are decided more comprehensively. This is very 
off-topic, but there are other more sensible quorums for multi-region setups 
(such as quorum-of-quorums), but also there’s a wide range of useful quorums we 
don’t support, particularly heterogenous ones supporting lower write failure 
tolerance than read failure tolerance (for instance). Today we support only the 
most extreme versions of this, and all of our quorums must be mixed manually by 
clients which is error prone. In my opinion we should be moving towards 
specifying quorums on a per-table basis for reads and writes, so that clients 
do not specify their consistency levels. This way the database can configure 
arbitrary quorums, and also guarantee that these quorums provide the desired 
consistency.

From: Miles Garnsey <[email protected]>
Date: Friday, 20 August 2021 at 00:47
To: [email protected] <[email protected]>
Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
Long time listener, first time caller here - hello!

I am very interested in this part "Better safety among range movements: 
Electorate verification during range movements provides a stronger assertion of 
linearizability via assurance of the set of instances voting on a transaction.”

I have seen issues in the wild where people want to add/remove DCs. I think 
that there may be a risk consistency violations due to transactions 
circumventing the locks held by in-progress transactions. Will electorate 
verification help in the below scenario?
Queries are running at SERIAL, writing at EACH_QUORUM against DC1 at RF=3.
DC2 is added, and once all nodes are in UN the schema is adjusted so that DC2’s 
RF=3.
While the new schema propagates, there is a transitional state, in which some 
potential coordinators have the new schema S2, and others are operating on the 
old schema S1.
In this state, S2 form consensus from 4/6 nodes, while S1 coordinators form 
consensus from 2/3 nodes.
A query issued from an S1 coordinator can form a valid consensus which will 
circumvent the lock held by an S2 coordinator.
I was thinking of proposing an EACH_QUORUM serial CL, but if electorate 
verification solves the problem then that may be the better solution.

Miles

> On 19 Aug 2021, at 9:18 am, Scott Andreas <[email protected]> wrote:
>
> Benedict, thank you for sharing this CEP!
>
> Adding some notes on why I support this proposal:
>
> - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x on 
> reads is a huge improvement. This latency reduction may be sufficient to 
> allow many users of Cassandra who operate in a single datacenter, 
> availability zone, or region to migrate to a multi-region topology.
>
> - The Cluster Simulation work described in CEP-10 provides a toolchain for 
> probabilistically-exhaustive validation and simulation of transactional 
> correctness, allowing assertion of linearizability in the presence of 
> adversarial thread scheduling and message ordering over an unbounded number 
> of simulated clusters and transactions.
>
> - Some use cases may see a superlinear increase in LWT performance due to a 
> reduction in contention afforded by fewer message round-trips. E.g., halving 
> latency shortens the interval during which competing transactions may 
> conflict, reducing contention and improving throughput beyond a level that 
> would be afforded by the latency reduction alone.
>
> - Better safety among range movements: Electorate verification during range 
> movements provides a stronger assertion of linearizability via assurance of 
> the set of instances voting on a transaction.
>
> – Scott
>
> ________________________________________
> From: [email protected] <[email protected]>
> Sent: Wednesday, August 18, 2021 2:31 PM
> To: [email protected]
> Subject: [DISCUSS] CEP 14: Paxos Improvements
>
> RE: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements
>
> I’m proposing this CEP for approval by the project. The goal is to both 
> improve the performance of LWTs and to ensure their correctness across a 
> range of scenario like range movements. This work builds upon the Simulator 
> CEP that has been recently adopted, and patches will follow in the coming 
> weeks.
>
> If you have any concerns or questions please raise them here for discussion.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

Re: [DISCUSS] CEP 14: Paxos Improvements

Reply via email to