> > In my opinion we should be moving towards specifying quorums on a > per-table basis for reads and writes, so that clients do not specify their > consistency levels.
This stood out to me: I'm a strong +1 on this. The less clients have to know about their powerful and complex distributed database and still gain the benefits of it the better. ~Josh On Fri, Aug 20, 2021 at 8:41 AM bened...@apache.org <bened...@apache.org> wrote: > > My initial testing suggestedit was not required (when the new DC is not > serving reads). > > The problem is that today there’s no way to reliably exclude the new DC > from serving reads, that I know of? If you can, then yes you would only > need to ensure repair were run prior to activating reads from this DC. > > > Perhaps the CL mechanism could be pluggable > > I think this is unlikely, particularly as we start to consider things like > consensus - at least any time soon. Quorums are quite intricately woven > into any implementation, and it would be quite hard to fully generalise > them. In practice we can probably accommodate any simple vote threshold > quorums (those where some electorate each have a vote, and each vote has > an equal weight that reaches consensus once a threshold is crossed) and > support at least one level of nesting (so that DCs may logically vote as a > block based on some quorum within a DC) in any topology without a plugin > system, and I suspect this will be more than enough for any system in the > foreseeable future. > > > I wonder if it should be a ‘default CL’ which can additionally be > overridden by queries? > > There are some practicalities that probably prohibit us from eliminating > user provided CLs, but I would like to see them phased out as far as > possible as they are very hard to verify. To support this flexibility more > generally I’d prefer to see tables offer potentially multiple consensus > schemes with potentially different qualities (that can perhaps even be > named by the user) for these cases, such as (for instance) > fast-and-inconsistent-reads. This still permits their properties to be > vetted by the database while offering flexibility to the user, and for them > to declare at the operator level what meeting this concept requires. It > also means the database can maintain these properties through any topology > change. > > But we’ll probably have people using legacy CLs for another decade, so > we’re going to have to support people querying with those CLs, but we might > want to encourage people to disable them on their clusters and migrate to > safer setups. > > From: Miles Garnsey <miles.garn...@datastax.com> > Date: Friday, 20 August 2021 at 12:51 > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > Subject: Re: [DISCUSS] CEP 14: Paxos Improvements > Many thanks for this detailed response Benedict. I look forward to seeing > the forthcoming proposals in relation to schema change safety when LWTs are > in use. > > We have been following almost the scale-by-one workaround you described - > I am grateful for the additional validation. The only divergence is that we > have not been advising a repair in between each node addition. My initial > testing suggestedit was not required (when the new DC is not serving > reads). But if you are aware of issues that arise at scale then I’d love to > hear your experience, as we are still in the planning phase for that > project. > > Regarding CLs (off topic) > > > To respond to Mick: we could introduce an EACH_SERIAL which would permit > this to be done in one go. This isn’t a super complicated piece of work, > and I’d be happy to help review a contribution here. However, in my view we > should be reconsidering how quorums are decided more comprehensively. This > is very off-topic, but there are other more sensible quorums for > multi-region setups (such as quorum-of-quorums), but also there’s a wide > range of useful quorums we don’t support, particularly heterogenous ones > supporting lower write failure tolerance than read failure tolerance (for > instance). Today we support only the most extreme versions of this, and all > of our quorums must be mixed manually by clients which is error prone. In > my opinion we should be moving towards specifying quorums on a per-table > basis for reads and writes, so that clients do not specify their > consistency levels. This way the database can configure arbitrary quorums, > and also guarantee that these quorums provide the desired consistency. > > I agree with your points here. I’d add that the geographical location of > DCs can be relevant. > Perhaps the CL mechanism could be pluggable (in the same way that authn/z > both are) so that we can experiment in this area at higher velocity? (I > appreciate this is an invasive change.) > A colleague and I are considering whether we might be able to look at the > EACH_QUORUM idea in the shorter term. We will share more if we have the > bandwidth to undertake the work. > I also agree that CLs defined for tables is a worthy enhancement, I wonder > if it should be a ‘default CL’ which can additionally be overridden by > queries? > > In any event I feel I’ve hijacked your thread enough, but thank you again > for the warm welcome and the interesting discussion! > > > On 20 Aug 2021, at 7:04 pm, bened...@apache.org wrote: > > > > Hello and welcome! > > > > So this is a really complicated topic, unfortunately, but the simple > answer is that as currently formulated this work won’t address this > particular case. The slightly longer answer is that this problem will be a > thing of the past soon either way - there’s work incoming to address every > possible category of this kind of problem, but it might take a little > longer. > > > > The full answer is that membership of a keyspace in Cassandra is a mess, > and is derived from the intersection of two things: schema and gossip. The > electorate verification addresses _gossip_ inconsistencies, that is, > inconsistencies about what nodes are perceived to be a member of the ring. > Schema generates the issue you are discussing here. In particular the lack > of any state machine that transitions from one topology to another when a > new schema implies a new topology. This is its own distinct problem, that > others I work with plan to file a CEP for in the coming weeks or months. > > > > In the meantime, the correct way to do this (painful though it might be) > is to add one node at a time. So instead of adding DC2 at RF=3, add DC2 at > RF=1 and wait for that to settle, *run repair* and then bump to RF=2, etc. > > > > To respond to Mick: we could introduce an EACH_SERIAL which would permit > this to be done in one go. This isn’t a super complicated piece of work, > and I’d be happy to help review a contribution here. However, in my view we > should be reconsidering how quorums are decided more comprehensively. This > is very off-topic, but there are other more sensible quorums for > multi-region setups (such as quorum-of-quorums), but also there’s a wide > range of useful quorums we don’t support, particularly heterogenous ones > supporting lower write failure tolerance than read failure tolerance (for > instance). Today we support only the most extreme versions of this, and all > of our quorums must be mixed manually by clients which is error prone. In > my opinion we should be moving towards specifying quorums on a per-table > basis for reads and writes, so that clients do not specify their > consistency levels. This way the database can configure arbitrary quorums, > and also guarantee that these quorums provide the desired consistency. > > > > > > From: Miles Garnsey <miles.garn...@datastax.com> > > Date: Friday, 20 August 2021 at 00:47 > > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > Subject: Re: [DISCUSS] CEP 14: Paxos Improvements > > Long time listener, first time caller here - hello! > > > > I am very interested in this part "Better safety among range movements: > Electorate verification during range movements provides a stronger > assertion of linearizability via assurance of the set of instances voting > on a transaction.” > > > > I have seen issues in the wild where people want to add/remove DCs. I > think that there may be a risk consistency violations due to transactions > circumventing the locks held by in-progress transactions. Will electorate > verification help in the below scenario? > > Queries are running at SERIAL, writing at EACH_QUORUM against DC1 at > RF=3. > > DC2 is added, and once all nodes are in UN the schema is adjusted so > that DC2’s RF=3. > > While the new schema propagates, there is a transitional state, in which > some potential coordinators have the new schema S2, and others are > operating on the old schema S1. > > In this state, S2 form consensus from 4/6 nodes, while S1 coordinators > form consensus from 2/3 nodes. > > A query issued from an S1 coordinator can form a valid consensus which > will circumvent the lock held by an S2 coordinator. > > I was thinking of proposing an EACH_QUORUM serial CL, but if electorate > verification solves the problem then that may be the better solution. > > > > Miles > > > > > >> On 19 Aug 2021, at 9:18 am, Scott Andreas <sc...@paradoxica.net> wrote: > >> > >> Benedict, thank you for sharing this CEP! > >> > >> Adding some notes on why I support this proposal: > >> > >> - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x > on reads is a huge improvement. This latency reduction may be sufficient to > allow many users of Cassandra who operate in a single datacenter, > availability zone, or region to migrate to a multi-region topology. > >> > >> - The Cluster Simulation work described in CEP-10 provides a toolchain > for probabilistically-exhaustive validation and simulation of transactional > correctness, allowing assertion of linearizability in the presence of > adversarial thread scheduling and message ordering over an unbounded number > of simulated clusters and transactions. > >> > >> - Some use cases may see a superlinear increase in LWT performance due > to a reduction in contention afforded by fewer message round-trips. E.g., > halving latency shortens the interval during which competing transactions > may conflict, reducing contention and improving throughput beyond a level > that would be afforded by the latency reduction alone. > >> > >> - Better safety among range movements: Electorate verification during > range movements provides a stronger assertion of linearizability via > assurance of the set of instances voting on a transaction. > >> > >> – Scott > >> > >> ________________________________________ > >> From: bened...@apache.org <bened...@apache.org> > >> Sent: Wednesday, August 18, 2021 2:31 PM > >> To: dev@cassandra.apache.org > >> Subject: [DISCUSS] CEP 14: Paxos Improvements > >> > >> RE: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements > >> > >> I’m proposing this CEP for approval by the project. The goal is to both > improve the performance of LWTs and to ensure their correctness across a > range of scenario like range movements. This work builds upon the Simulator > CEP that has been recently adopted, and patches will follow in the coming > weeks. > >> > >> If you have any concerns or questions please raise them here for > discussion. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > >> >