> My initial testing suggestedit was not required (when the new DC is not > serving reads).
The problem is that today there’s no way to reliably exclude the new DC from serving reads, that I know of? If you can, then yes you would only need to ensure repair were run prior to activating reads from this DC. > Perhaps the CL mechanism could be pluggable I think this is unlikely, particularly as we start to consider things like consensus - at least any time soon. Quorums are quite intricately woven into any implementation, and it would be quite hard to fully generalise them. In practice we can probably accommodate any simple vote threshold quorums (those where some electorate each have a vote, and each vote has an equal weight that reaches consensus once a threshold is crossed) and support at least one level of nesting (so that DCs may logically vote as a block based on some quorum within a DC) in any topology without a plugin system, and I suspect this will be more than enough for any system in the foreseeable future. > I wonder if it should be a ‘default CL’ which can additionally be overridden > by queries? There are some practicalities that probably prohibit us from eliminating user provided CLs, but I would like to see them phased out as far as possible as they are very hard to verify. To support this flexibility more generally I’d prefer to see tables offer potentially multiple consensus schemes with potentially different qualities (that can perhaps even be named by the user) for these cases, such as (for instance) fast-and-inconsistent-reads. This still permits their properties to be vetted by the database while offering flexibility to the user, and for them to declare at the operator level what meeting this concept requires. It also means the database can maintain these properties through any topology change. But we’ll probably have people using legacy CLs for another decade, so we’re going to have to support people querying with those CLs, but we might want to encourage people to disable them on their clusters and migrate to safer setups. From: Miles Garnsey <miles.garn...@datastax.com> Date: Friday, 20 August 2021 at 12:51 To: dev@cassandra.apache.org <dev@cassandra.apache.org> Subject: Re: [DISCUSS] CEP 14: Paxos Improvements Many thanks for this detailed response Benedict. I look forward to seeing the forthcoming proposals in relation to schema change safety when LWTs are in use. We have been following almost the scale-by-one workaround you described - I am grateful for the additional validation. The only divergence is that we have not been advising a repair in between each node addition. My initial testing suggestedit was not required (when the new DC is not serving reads). But if you are aware of issues that arise at scale then I’d love to hear your experience, as we are still in the planning phase for that project. Regarding CLs (off topic) > To respond to Mick: we could introduce an EACH_SERIAL which would permit this > to be done in one go. This isn’t a super complicated piece of work, and I’d > be happy to help review a contribution here. However, in my view we should be > reconsidering how quorums are decided more comprehensively. This is very > off-topic, but there are other more sensible quorums for multi-region setups > (such as quorum-of-quorums), but also there’s a wide range of useful quorums > we don’t support, particularly heterogenous ones supporting lower write > failure tolerance than read failure tolerance (for instance). Today we > support only the most extreme versions of this, and all of our quorums must > be mixed manually by clients which is error prone. In my opinion we should be > moving towards specifying quorums on a per-table basis for reads and writes, > so that clients do not specify their consistency levels. This way the > database can configure arbitrary quorums, and also guarantee that these > quorums provide the desired consistency. I agree with your points here. I’d add that the geographical location of DCs can be relevant. Perhaps the CL mechanism could be pluggable (in the same way that authn/z both are) so that we can experiment in this area at higher velocity? (I appreciate this is an invasive change.) A colleague and I are considering whether we might be able to look at the EACH_QUORUM idea in the shorter term. We will share more if we have the bandwidth to undertake the work. I also agree that CLs defined for tables is a worthy enhancement, I wonder if it should be a ‘default CL’ which can additionally be overridden by queries? In any event I feel I’ve hijacked your thread enough, but thank you again for the warm welcome and the interesting discussion! > On 20 Aug 2021, at 7:04 pm, bened...@apache.org wrote: > > Hello and welcome! > > So this is a really complicated topic, unfortunately, but the simple answer > is that as currently formulated this work won’t address this particular case. > The slightly longer answer is that this problem will be a thing of the past > soon either way - there’s work incoming to address every possible category of > this kind of problem, but it might take a little longer. > > The full answer is that membership of a keyspace in Cassandra is a mess, and > is derived from the intersection of two things: schema and gossip. The > electorate verification addresses _gossip_ inconsistencies, that is, > inconsistencies about what nodes are perceived to be a member of the ring. > Schema generates the issue you are discussing here. In particular the lack of > any state machine that transitions from one topology to another when a new > schema implies a new topology. This is its own distinct problem, that others > I work with plan to file a CEP for in the coming weeks or months. > > In the meantime, the correct way to do this (painful though it might be) is > to add one node at a time. So instead of adding DC2 at RF=3, add DC2 at RF=1 > and wait for that to settle, *run repair* and then bump to RF=2, etc. > > To respond to Mick: we could introduce an EACH_SERIAL which would permit this > to be done in one go. This isn’t a super complicated piece of work, and I’d > be happy to help review a contribution here. However, in my view we should be > reconsidering how quorums are decided more comprehensively. This is very > off-topic, but there are other more sensible quorums for multi-region setups > (such as quorum-of-quorums), but also there’s a wide range of useful quorums > we don’t support, particularly heterogenous ones supporting lower write > failure tolerance than read failure tolerance (for instance). Today we > support only the most extreme versions of this, and all of our quorums must > be mixed manually by clients which is error prone. In my opinion we should be > moving towards specifying quorums on a per-table basis for reads and writes, > so that clients do not specify their consistency levels. This way the > database can configure arbitrary quorums, and also guarantee that these > quorums provide the desired consistency. > > > From: Miles Garnsey <miles.garn...@datastax.com> > Date: Friday, 20 August 2021 at 00:47 > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > Subject: Re: [DISCUSS] CEP 14: Paxos Improvements > Long time listener, first time caller here - hello! > > I am very interested in this part "Better safety among range movements: > Electorate verification during range movements provides a stronger assertion > of linearizability via assurance of the set of instances voting on a > transaction.” > > I have seen issues in the wild where people want to add/remove DCs. I think > that there may be a risk consistency violations due to transactions > circumventing the locks held by in-progress transactions. Will electorate > verification help in the below scenario? > Queries are running at SERIAL, writing at EACH_QUORUM against DC1 at RF=3. > DC2 is added, and once all nodes are in UN the schema is adjusted so that > DC2’s RF=3. > While the new schema propagates, there is a transitional state, in which some > potential coordinators have the new schema S2, and others are operating on > the old schema S1. > In this state, S2 form consensus from 4/6 nodes, while S1 coordinators form > consensus from 2/3 nodes. > A query issued from an S1 coordinator can form a valid consensus which will > circumvent the lock held by an S2 coordinator. > I was thinking of proposing an EACH_QUORUM serial CL, but if electorate > verification solves the problem then that may be the better solution. > > Miles > > >> On 19 Aug 2021, at 9:18 am, Scott Andreas <sc...@paradoxica.net> wrote: >> >> Benedict, thank you for sharing this CEP! >> >> Adding some notes on why I support this proposal: >> >> - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x on >> reads is a huge improvement. This latency reduction may be sufficient to >> allow many users of Cassandra who operate in a single datacenter, >> availability zone, or region to migrate to a multi-region topology. >> >> - The Cluster Simulation work described in CEP-10 provides a toolchain for >> probabilistically-exhaustive validation and simulation of transactional >> correctness, allowing assertion of linearizability in the presence of >> adversarial thread scheduling and message ordering over an unbounded number >> of simulated clusters and transactions. >> >> - Some use cases may see a superlinear increase in LWT performance due to a >> reduction in contention afforded by fewer message round-trips. E.g., halving >> latency shortens the interval during which competing transactions may >> conflict, reducing contention and improving throughput beyond a level that >> would be afforded by the latency reduction alone. >> >> - Better safety among range movements: Electorate verification during range >> movements provides a stronger assertion of linearizability via assurance of >> the set of instances voting on a transaction. >> >> – Scott >> >> ________________________________________ >> From: bened...@apache.org <bened...@apache.org> >> Sent: Wednesday, August 18, 2021 2:31 PM >> To: dev@cassandra.apache.org >> Subject: [DISCUSS] CEP 14: Paxos Improvements >> >> RE: >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements >> >> I’m proposing this CEP for approval by the project. The goal is to both >> improve the performance of LWTs and to ensure their correctness across a >> range of scenario like range movements. This work builds upon the Simulator >> CEP that has been recently adopted, and patches will follow in the coming >> weeks. >> >> If you have any concerns or questions please raise them here for discussion. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >>