> Is the new implementation a separate, distinctly modularized new body of work
It’s primarily a distinct, modularised and new body of work, however there is
some shared code that has been modified - namely PaxosState, in which legacy
code is maintained but modified for compatibility, and the system.paxos table
(which receives a new column, and slightly modified serialization code). It is
conceptually an optimised version of the existing algorithm.
If there's a chance of being of value to 4.0, I can try to put up a patch next
week alongside a high level description of the changes.
> But a performance regression is a regression, I'm not shrugging it off.
I don't want to give the impression I'm shrugging off the correctness issue
either. It's a serious issue to fix, but since all successful updates to the
database are linearizable, I think it's likely that many applications behave
correctly with the present semantics, or at least encounter only transient
errors. No doubt many also do not, but I have no idea of the ratio.
The regression isn't itself a simple issue either - depending on the topology
and message latencies it is not difficult to produce inescapable contention,
i.e. guaranteed timeouts - that might persist as long as clients continue to
retry. It could be quite a serious degradation of service to impose on our
users.
I don't pretend to know the correct way to make a decision balancing these
considerations, but I am perhaps more concerned about imposing service outages
than I am temporarily maintaining semantics our users have apparently accepted
for years - though I absolutely share your embarrassment there.
On 12/11/2020, 12:41, "Joshua McKenzie" wrote:
Is the new implementation a separate, distinctly modularized new body of
work or does it make substantial changes to existing implementation and
subsume it?
On Thu, Nov 12, 2020 at 3:56 AM Sylvain Lebresne wrote:
> Regarding option #4, I'll remark that experience tends to suggest users
> don't consistently read the `NEWS.txt` file on upgrade, so option #4 will
> likely essentially mean "LWT has a correctness issue, but once it broke
> your data enough that you'll notice, you'll be able to dig the proper flag
> to fix it for next time". I guess it's better than nothing, of course, but
> I'll admit that defaulting to "opt-in correctness", especially for a
> feature (LWT) that exists uniquely to provide additional guarantees, is
> something I have a hard rallying behind.
>
> But a performance regression is a regression, I'm not shrugging it off.
> Still, I feel we shouldn't leave LWT with a fairly serious known
> correctness bug and I frankly feel bad for "the project" that this has
been
> known for so long without action, so I'm a bit biased in wanting to get it
> fixed asap.
>
> But maybe I'm overstating the urgency here, and maybe option #1 is a
better
> way forward.
>
> --
> Sylvain
>
-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org