>
> In my opinion we should be moving towards specifying quorums on a
> per-table basis for reads and writes, so that clients do not specify their
> consistency levels.

This stood out to me: I'm a strong +1 on this. The less clients have to
know about their powerful and complex distributed database and still gain
the benefits of it the better.

~Josh

On Fri, Aug 20, 2021 at 8:41 AM bened...@apache.org <bened...@apache.org>
wrote:

> > My initial testing suggestedit was not required (when the new DC is not
> serving reads).
>
> The problem is that today there’s no way to reliably exclude the new DC
> from serving reads, that I know of? If you can, then yes you would only
> need to ensure repair were run prior to activating reads from this DC.
>
> > Perhaps the CL mechanism could be pluggable
>
> I think this is unlikely, particularly as we start to consider things like
> consensus - at least any time soon. Quorums are quite intricately woven
> into any implementation, and it would be quite hard to fully generalise
> them. In practice we can probably accommodate any simple vote threshold
> quorums  (those where some electorate each have a vote, and each vote has
> an equal weight that reaches consensus once a threshold is crossed) and
> support at least one level of nesting (so that DCs may logically vote as a
> block based on some quorum within a DC) in any topology without a plugin
> system, and I suspect this will be more than enough for any system in the
> foreseeable future.
>
> > I wonder if it should be a ‘default CL’ which can additionally be
> overridden by queries?
>
> There are some practicalities that probably prohibit us from eliminating
> user provided CLs, but I would like to see them phased out as far as
> possible as they are very hard to verify. To support this flexibility more
> generally I’d prefer to see tables offer potentially multiple consensus
> schemes with potentially different qualities (that can perhaps even be
> named by the user) for these cases, such as (for instance)
> fast-and-inconsistent-reads. This still permits their properties to be
> vetted by the database while offering flexibility to the user, and for them
> to declare at the operator level what meeting this concept requires. It
> also means the database can maintain these properties through any topology
> change.
>
> But we’ll probably have people using legacy CLs for another decade, so
> we’re going to have to support people querying with those CLs, but we might
> want to encourage people to disable them on their clusters and migrate to
> safer setups.
>
> From: Miles Garnsey <miles.garn...@datastax.com>
> Date: Friday, 20 August 2021 at 12:51
> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
> Many thanks for this detailed response Benedict. I look forward to seeing
> the forthcoming proposals in relation to schema change safety when LWTs are
> in use.
>
> We have been following almost the scale-by-one workaround you described -
> I am grateful for the additional validation. The only divergence is that we
> have not been advising a repair in between each node addition. My initial
> testing suggestedit was not required (when the new DC is not serving
> reads). But if you are aware of issues that arise at scale then I’d love to
> hear your experience, as we are still in the planning phase for that
> project.
>
> Regarding CLs (off topic)
>
> > To respond to Mick: we could introduce an EACH_SERIAL which would permit
> this to be done in one go. This isn’t a super complicated piece of work,
> and I’d be happy to help review a contribution here. However, in my view we
> should be reconsidering how quorums are decided more comprehensively. This
> is very off-topic, but there are other more sensible quorums for
> multi-region setups (such as quorum-of-quorums), but also there’s a wide
> range of useful quorums we don’t support, particularly heterogenous ones
> supporting lower write failure tolerance than read failure tolerance (for
> instance). Today we support only the most extreme versions of this, and all
> of our quorums must be mixed manually by clients which is error prone. In
> my opinion we should be moving towards specifying quorums on a per-table
> basis for reads and writes, so that clients do not specify their
> consistency levels. This way the database can configure arbitrary quorums,
> and also guarantee that these quorums provide the desired consistency.
>
> I agree with your points here. I’d add that the geographical location of
> DCs can be relevant.
> Perhaps the CL mechanism could be pluggable (in the same way that authn/z
> both are) so that we can experiment in this area at higher velocity? (I
> appreciate this is an invasive change.)
> A colleague and I are considering whether we might be able to look at the
> EACH_QUORUM idea in the shorter term. We will share more if we have the
> bandwidth to undertake the work.
> I also agree that CLs defined for tables is a worthy enhancement, I wonder
> if it should be a ‘default CL’ which can additionally be overridden by
> queries?
>
> In any event I feel I’ve hijacked your thread enough, but thank you again
> for the warm welcome and the interesting discussion!
>
> > On 20 Aug 2021, at 7:04 pm, bened...@apache.org wrote:
> >
> > Hello and welcome!
> >
> > So this is a really complicated topic, unfortunately, but the simple
> answer is that as currently formulated this work won’t address this
> particular case. The slightly longer answer is that this problem will be a
> thing of the past soon either way - there’s work incoming to address every
> possible category of this kind of problem, but it might take a little
> longer.
> >
> > The full answer is that membership of a keyspace in Cassandra is a mess,
> and is derived from the intersection of two things: schema and gossip. The
> electorate verification addresses _gossip_ inconsistencies, that is,
> inconsistencies about what nodes are perceived to be a member of the ring.
> Schema generates the issue you are discussing here. In particular the lack
> of any state machine that transitions from one topology to another when a
> new schema implies a new topology. This is its own distinct problem, that
> others I work with plan to file a CEP for in the coming weeks or months.
> >
> > In the meantime, the correct way to do this (painful though it might be)
> is to add one node at a time. So instead of adding DC2 at RF=3, add DC2 at
> RF=1 and wait for that to settle, *run repair* and then bump to RF=2, etc.
> >
> > To respond to Mick: we could introduce an EACH_SERIAL which would permit
> this to be done in one go. This isn’t a super complicated piece of work,
> and I’d be happy to help review a contribution here. However, in my view we
> should be reconsidering how quorums are decided more comprehensively. This
> is very off-topic, but there are other more sensible quorums for
> multi-region setups (such as quorum-of-quorums), but also there’s a wide
> range of useful quorums we don’t support, particularly heterogenous ones
> supporting lower write failure tolerance than read failure tolerance (for
> instance). Today we support only the most extreme versions of this, and all
> of our quorums must be mixed manually by clients which is error prone. In
> my opinion we should be moving towards specifying quorums on a per-table
> basis for reads and writes, so that clients do not specify their
> consistency levels. This way the database can configure arbitrary quorums,
> and also guarantee that these quorums provide the desired consistency.
> >
> >
> > From: Miles Garnsey <miles.garn...@datastax.com>
> > Date: Friday, 20 August 2021 at 00:47
> > To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> > Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
> > Long time listener, first time caller here - hello!
> >
> > I am very interested in this part "Better safety among range movements:
> Electorate verification during range movements provides a stronger
> assertion of linearizability via assurance of the set of instances voting
> on a transaction.”
> >
> > I have seen issues in the wild where people want to add/remove DCs. I
> think that there may be a risk consistency violations due to transactions
> circumventing the locks held by in-progress transactions. Will electorate
> verification help in the below scenario?
> > Queries are running at SERIAL, writing at EACH_QUORUM against DC1 at
> RF=3.
> > DC2 is added, and once all nodes are in UN the schema is adjusted so
> that DC2’s RF=3.
> > While the new schema propagates, there is a transitional state, in which
> some potential coordinators have the new schema S2, and others are
> operating on the old schema S1.
> > In this state, S2 form consensus from 4/6 nodes, while S1 coordinators
> form consensus from 2/3 nodes.
> > A query issued from an S1 coordinator can form a valid consensus which
> will circumvent the lock held by an S2 coordinator.
> > I was thinking of proposing an EACH_QUORUM serial CL, but if electorate
> verification solves the problem then that may be the better solution.
> >
> > Miles
> >
> >
> >> On 19 Aug 2021, at 9:18 am, Scott Andreas <sc...@paradoxica.net> wrote:
> >>
> >> Benedict, thank you for sharing this CEP!
> >>
> >> Adding some notes on why I support this proposal:
> >>
> >> - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x
> on reads is a huge improvement. This latency reduction may be sufficient to
> allow many users of Cassandra who operate in a single datacenter,
> availability zone, or region to migrate to a multi-region topology.
> >>
> >> - The Cluster Simulation work described in CEP-10 provides a toolchain
> for probabilistically-exhaustive validation and simulation of transactional
> correctness, allowing assertion of linearizability in the presence of
> adversarial thread scheduling and message ordering over an unbounded number
> of simulated clusters and transactions.
> >>
> >> - Some use cases may see a superlinear increase in LWT performance due
> to a reduction in contention afforded by fewer message round-trips. E.g.,
> halving latency shortens the interval during which competing transactions
> may conflict, reducing contention and improving throughput beyond a level
> that would be afforded by the latency reduction alone.
> >>
> >> - Better safety among range movements: Electorate verification during
> range movements provides a stronger assertion of linearizability via
> assurance of the set of instances voting on a transaction.
> >>
> >> – Scott
> >>
> >> ________________________________________
> >> From: bened...@apache.org <bened...@apache.org>
> >> Sent: Wednesday, August 18, 2021 2:31 PM
> >> To: dev@cassandra.apache.org
> >> Subject: [DISCUSS] CEP 14: Paxos Improvements
> >>
> >> RE:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements
> >>
> >> I’m proposing this CEP for approval by the project. The goal is to both
> improve the performance of LWTs and to ensure their correctness across a
> range of scenario like range movements. This work builds upon the Simulator
> CEP that has been recently adopted, and patches will follow in the coming
> weeks.
> >>
> >> If you have any concerns or questions please raise them here for
> discussion.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
>

Reply via email to