Hello and welcome! So this is a really complicated topic, unfortunately, but the simple answer is that as currently formulated this work won’t address this particular case. The slightly longer answer is that this problem will be a thing of the past soon either way - there’s work incoming to address every possible category of this kind of problem, but it might take a little longer.
The full answer is that membership of a keyspace in Cassandra is a mess, and is derived from the intersection of two things: schema and gossip. The electorate verification addresses _gossip_ inconsistencies, that is, inconsistencies about what nodes are perceived to be a member of the ring. Schema generates the issue you are discussing here. In particular the lack of any state machine that transitions from one topology to another when a new schema implies a new topology. This is its own distinct problem, that others I work with plan to file a CEP for in the coming weeks or months. In the meantime, the correct way to do this (painful though it might be) is to add one node at a time. So instead of adding DC2 at RF=3, add DC2 at RF=1 and wait for that to settle, *run repair* and then bump to RF=2, etc. To respond to Mick: we could introduce an EACH_SERIAL which would permit this to be done in one go. This isn’t a super complicated piece of work, and I’d be happy to help review a contribution here. However, in my view we should be reconsidering how quorums are decided more comprehensively. This is very off-topic, but there are other more sensible quorums for multi-region setups (such as quorum-of-quorums), but also there’s a wide range of useful quorums we don’t support, particularly heterogenous ones supporting lower write failure tolerance than read failure tolerance (for instance). Today we support only the most extreme versions of this, and all of our quorums must be mixed manually by clients which is error prone. In my opinion we should be moving towards specifying quorums on a per-table basis for reads and writes, so that clients do not specify their consistency levels. This way the database can configure arbitrary quorums, and also guarantee that these quorums provide the desired consistency. From: Miles Garnsey <miles.garn...@datastax.com> Date: Friday, 20 August 2021 at 00:47 To: dev@cassandra.apache.org <dev@cassandra.apache.org> Subject: Re: [DISCUSS] CEP 14: Paxos Improvements Long time listener, first time caller here - hello! I am very interested in this part "Better safety among range movements: Electorate verification during range movements provides a stronger assertion of linearizability via assurance of the set of instances voting on a transaction.” I have seen issues in the wild where people want to add/remove DCs. I think that there may be a risk consistency violations due to transactions circumventing the locks held by in-progress transactions. Will electorate verification help in the below scenario? Queries are running at SERIAL, writing at EACH_QUORUM against DC1 at RF=3. DC2 is added, and once all nodes are in UN the schema is adjusted so that DC2’s RF=3. While the new schema propagates, there is a transitional state, in which some potential coordinators have the new schema S2, and others are operating on the old schema S1. In this state, S2 form consensus from 4/6 nodes, while S1 coordinators form consensus from 2/3 nodes. A query issued from an S1 coordinator can form a valid consensus which will circumvent the lock held by an S2 coordinator. I was thinking of proposing an EACH_QUORUM serial CL, but if electorate verification solves the problem then that may be the better solution. Miles > On 19 Aug 2021, at 9:18 am, Scott Andreas <sc...@paradoxica.net> wrote: > > Benedict, thank you for sharing this CEP! > > Adding some notes on why I support this proposal: > > - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x on > reads is a huge improvement. This latency reduction may be sufficient to > allow many users of Cassandra who operate in a single datacenter, > availability zone, or region to migrate to a multi-region topology. > > - The Cluster Simulation work described in CEP-10 provides a toolchain for > probabilistically-exhaustive validation and simulation of transactional > correctness, allowing assertion of linearizability in the presence of > adversarial thread scheduling and message ordering over an unbounded number > of simulated clusters and transactions. > > - Some use cases may see a superlinear increase in LWT performance due to a > reduction in contention afforded by fewer message round-trips. E.g., halving > latency shortens the interval during which competing transactions may > conflict, reducing contention and improving throughput beyond a level that > would be afforded by the latency reduction alone. > > - Better safety among range movements: Electorate verification during range > movements provides a stronger assertion of linearizability via assurance of > the set of instances voting on a transaction. > > – Scott > > ________________________________________ > From: bened...@apache.org <bened...@apache.org> > Sent: Wednesday, August 18, 2021 2:31 PM > To: dev@cassandra.apache.org > Subject: [DISCUSS] CEP 14: Paxos Improvements > > RE: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements > > I’m proposing this CEP for approval by the project. The goal is to both > improve the performance of LWTs and to ensure their correctness across a > range of scenario like range movements. This work builds upon the Simulator > CEP that has been recently adopted, and patches will follow in the coming > weeks. > > If you have any concerns or questions please raise them here for discussion. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org >