> 1. Major SSTable changes should begin with forward-compatibility in a
prior release.

This requires "feature" changes, i.e. new non-trivial code for previous
patch releases. It also entails porting over any further format
modification.

Instead of this, in combination with your second point, why not implement
backwards write compatibility? The opt-in is then clearer to define (i.e.
upgrades start with e.g. a "4.1-compatible" settings set that includes file
format compatibility and disabling of new features, new nodes start with
"current" settings set). When the upgrade completes and the user is happy
with the result, the settings set can be replaced.

Doesn't this achieve what you want (and we all agree is a worthy goal) with
much less effort for everyone? Supporting backwards-compatible writing is
trivial, and we even have a proof-of-concept in the stats metadata
serializer. It also simplifies by a serious margin the amount of work and
thinking one has to do when a format improvement is implemented -- e.g. the
TTL patch can just address this in exactly the way the problem was
addressed in earlier versions of the format, by capping to 2038, without
any need to specify, obey or test any configuration flags.

>> It’s a commitment, and it requires every contributor to consider it as
part of work they produce.

> But it shouldn't be a burden. Ability to downgrade is a testable problem,
so I see this work as a function of the suite of tests the project is
willing to agree on supporting.

I fully agree with this sentiment, and I feel that the current "try to not
introduce breaking changes" approach is adding the burden, but not the
benefits -- because the latter cannot be proven, and are most likely
already broken.

Regards,
Branimir

On Wed, Feb 22, 2023 at 1:01 AM Abe Ratnofsky <a...@aber.io> wrote:

> Some interesting existing work on this subject is "Understanding and
> Detecting Software Upgrade Failures in Distributed Systems" -
> https://dl.acm.org/doi/10.1145/3477132.3483577
> <https://urldefense.com/v3/__https://dl.acm.org/doi/10.1145/3477132.3483577__;!!PbtH5S7Ebw!ZUMhWOKjMaK62HKCGLYN0rAhZbbX8fOJkgCsfMgjYO5EgJQulefcb5pwH4q5oU5ylLl6W56W-NWm0FLO7w$>,
> also summarized by Andrey Satarin here:
> https://asatarin.github.io/talks/2022-09-upgrade-failures-in-distributed-systems/
> <https://urldefense.com/v3/__https://asatarin.github.io/talks/2022-09-upgrade-failures-in-distributed-systems/__;!!PbtH5S7Ebw!ZUMhWOKjMaK62HKCGLYN0rAhZbbX8fOJkgCsfMgjYO5EgJQulefcb5pwH4q5oU5ylLl6W56W-NUfWWwFsA$>
>
> They specifically tested Cassandra upgrades, and have a solid list of
> defects that they found. They also describe their testing mechanism
> DUPTester, which includes a component that confirms that the leftover state
> from one version can start up on the next version. There is a wider scope
> of upgrade defects highlighted in the paper, beyond SSTable version support.
>
> I believe the project would benefit from expanding our test suite
> similarly, by parametrizing more tests on upgrade version pairs.
>
> Also, per Benedict's comment:
>
> > It’s a commitment, and it requires every contributor to consider it as
> part of work they produce.
>
> But it shouldn't be a burden. Ability to downgrade is a testable problem,
> so I see this work as a function of the suite of tests the project is
> willing to agree on supporting.
>
> Specifically - I agree with Scott's proposal to emulate the HDFS
> upgrade-then-finalize approach. I would also support automatic finalization
> based on a time threshold or similar, to balance the priorities of safe and
> straightforward upgrades. Users need to be aware of the range of SSTable
> formats supported by a given version, and how to handle when their SSTables
> wouldn't be supported by an upcoming upgrade.
>
> --
> Abe
>


-- 
Branimir Lambov
e. branimir.lam...@datastax.com
w. www.datastax.com

Reply via email to