If we can get opt-in major format upgrades, as well as an offline 
sstabledowngrade tool, I think we have a good first step that would make 
downgrades possible.

Given Jacek’s work on the sstable format API, and the work from Yuki and Claude 
on old formats, I think we are pretty close to having both of those be viable?

I think with the opt-in major format upgrades, the main thing will be to ensure 
that all new features that were built around the new format either fail 
gracefully, or for a change in behavior opt to the old behavior until the new 
format is available?  If a new feature is using a feature flag this could be a 
simple check to throw a configuration exception if the feature is enabled, but 
the new sstable format is not available.

No features have yet merged that bump the sstable major version, but a few are 
finishing up that will.  Do we want to block merging those changes until 
discussions here finish?  I don’t think that we need to?  The ticket which 
brings in the ability to opt-in to the sstable format change can also fix up 
the existing code to check the flag?

-Jeremiah


> On Feb 21, 2023, at 10:29 AM, Benedict <bened...@apache.org> wrote:
> 
> As always, Scott puts it much more eloquently than I can. 
> 
> The only thing I’d quibble with is that I think it is better to make changes 
> backwards compatible, rather than make earlier releases forwards compatible - 
> and where this is prohibitively costly to simply make a feature that depends 
> on it unavailable until the switch to the new major format.
> 
> This provides the greatest flexibility for users, as they can upgrade from 
> and downgrade to the same versions. There’s no scrambling for a different 
> downgrade target you haven’t qualified when finding out there’s an 
> unacceptable bug. 
> There’s also less delta between pre-upgrade and post-downgrade behaviour.
> 
> We have plenty of practice doing this kind of thing. It’s not that hard.
> 
> But, if we want to go the forward compatibility route that’s still far better 
> than nothing.
> 
> 
>> On 21 Feb 2023, at 16:17, C. Scott Andreas <sc...@paradoxica.net> wrote:
>> 
>> 
>> I realize my feedback on this has been spread across tickets and older 
>> mailing list / wiki discussions, so I'll offer a proposal here.
>> 
>> Starting with goals -
>> 
>> 1. Cassandra users must be able to abort and revert an upgrade to a new 
>> version of the database that introduces a new major SSTable format.
>> 
>> This reduces risk of upgrading to a build that also introduces a 
>> non-data-format-related bug that is intolerable. This goal does not specify 
>> a mechanism or downgrade target - just the "downgradability" goal.
>> 
>> 2. Where possible, Cassandra users should be able to opt into writing of a 
>> new major SSTable format.
>> 
>> This reduces that risk further by allowing users to decouple data format 
>> changes from the upgrade itself. There may be cases where new features or 
>> bug fixes prevent this from being possible, but I'll offer it as a goal.
>> 
>> 3. It should be possible for users to perform the downgrade in-place by 
>> launching the database using a previous version's binary.
>> 
>> This avoids the need for complex orchestration of offline commands like a 
>> hypothetical `downgradesstables`.
>> 
>> 
>> The following approach would allow us to accomplish these goals:
>> 
>> 1. Major SSTable changes should begin with forward-compatibility in a prior 
>> release.
>> 
>> In a release prior to one that revs major SSTable versions, we should 
>> implement the ability to read the SSTables that we intend to write in the 
>> next major version. This would allow someone to (eg.,) revert from 5.0 to 
>> 4.2 if they encountered a regression that caused an outage without data 
>> loss. This downgrade path should be well-specified and called out in 
>> NEWS.txt.
>> 
>> 2. Where possible, major SSTable format changes should be opt-in (if the 
>> features / bugfixes introduced allow).
>> 
>> This would be via a flag to enable writing the new format once an operator 
>> has determined that post-upgrade their clusters are sufficiently stable. 
>> This is an approach that HDFS has adopted. Following a rolling upgrade of 
>> HDFS, downgrade remains possible until an operator executes a "finalize" 
>> operation to migrate NameNode metadata to the new version's. An approach 
>> like this would allow users to perform a staged upgrade in which they first 
>> rev the version of the database, followed by opting into its new format to 
>> derisk (eg.,) adoption of BTI-indexed SSTables.
>> 
>> These approaches aren't meant to discourage SSTable format evolution - but 
>> to make it safer, and ideally faster. They don't specify duplicative 
>> serialization or a game of Twister to hide fields in locations where old 
>> versions don't think to look. Forward compatibility in a prior release could 
>> be landed at the same time as the major format revision itself, so long as 
>> we cut releases from both branches.
>> 
>> Ability to back out an upgrade until finalized would dramatically lower the 
>> risk of adopting new releases of Apache Cassandra. For many users, the 
>> qualification cycle for a new release is more than a year - and a *lot* of 
>> work.
>> 
>> Reducing the risk of upgrading to new releases repositions Cassandra as a 
>> database that can be treated with greater trust -- especially for 
>> multi-petabyte, mission critical systems. Our user community will advance to 
>> newer releases more quickly and we'll be able to shorten the maintenance 
>> cycles for older releases. In the same way that CI stability enables us to 
>> move faster and more confidently in the project, safety features like this 
>> will enable our users (and indeed ourselves) to move more confidently to 
>> adopt them.
>> 
>> – Scott
>> 
>> 
>>> On Feb 21, 2023, at 4:51 AM, "Claude Warren, Jr via dev" 
>>> <dev@cassandra.apache.org> wrote:
>>> 
>>> 
>>> My goal in implementing CASSANDRA-8928 
>>> <https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8928>
>>>  was to be able to take the current version 4.x and write it as the 
>>> earliest 3.x version possible.  The reasoning being that if that was 
>>> possible then whatever 3.x version was executed would be able to 
>>> automatically read the early 3.x version.  My thought was that each release 
>>> version would have the ability to downgrade to the earliest previous 
>>> version.  In this way if users need to they could string together a number 
>>> of downgrader versions to move from 5.x to 3.x. 
>>> 
>>> My testing has been pretty straightforward, I created 4 docker containers 
>>> using the standard published Cassandra docker images for 3.1 and 4.0 with 
>>> data mounted on an external drive.  two of the containers (one of each 
>>> version) did not automatically start Cassandra.  My process was then:
>>> start and stop Cassandra 4.0 to create the default data files
>>> start the Cassandra 4.0 container that does not automatically run Cassandra 
>>> and execute the new downgrade functionality.
>>> start the Cassandra 3.1 container and dump the logs.  If the system started 
>>> then I knew that I at least had a proof of concept.  So far no-go.
>>> 
>>> 
>>> On Tue, Feb 21, 2023 at 8:57 AM Branimir Lambov 
>>> <branimir.lam...@datastax.com <mailto:branimir.lam...@datastax.com>> wrote:
>>>> It appears to me that the first thing we need to start this feature off is 
>>>> a definition of a suite of tests together with a set of rules to keep the 
>>>> suite up to date with new features as they are introduced. The moment that 
>>>> suite is in place, we can start having some confidence that we can enforce 
>>>> downgradability.
>>>> 
>>>> Something like this will definitely catch incompatibilities in SSTable 
>>>> formats (such as the one in CASSANDRA-17698 that I managed to miss during 
>>>> review), but will also be able to identify incompatible system schema 
>>>> changes among others, and at the same time rightfully ignore non-breaking 
>>>> changes such as modifications to the key cache serialization formats.
>>>> 
>>>> Is downgradability in scope for 5.0? It is a feature like any other, and I 
>>>> don't see any difficulty adding it (with support for downgrade to 4.x) a 
>>>> little later in the 5.x timeline.
>>>> 
>>>> Regards,
>>>> Branimir
>>>> 
>>>> 
>>>> On Tue, Feb 21, 2023 at 9:40 AM Jacek Lewandowski 
>>>> <lewandowski.ja...@gmail.com <mailto:lewandowski.ja...@gmail.com>> wrote:
>>>>> I'd like to mention CASSANDRA-17056 (CEP-17) here as it aims to introduce 
>>>>> multiple sstable formats support. It allows for providing an 
>>>>> implementation of SSTableFormat along with SSTableReader and 
>>>>> SSTableWriter. That could be extended easily to support different 
>>>>> implementations for certain version ranges, like one impl for ma-nz, 
>>>>> other for oa+, etc. without having a confusing implementation with a lot 
>>>>> of conditional blocks. Old formats in such case could be maintained 
>>>>> separately from the main code and easily switched any time. 
>>>>> 
>>>>> thanks
>>>>> - - -- --- ----- -------- -------------
>>>>> Jacek Lewandowski
>>>>> 
>>>>> 
>>>>> wt., 21 lut 2023 o 01:46 Yuki Morishita <yu...@apache.org 
>>>>> <mailto:yu...@apache.org>> napisał(a):
>>>>>> Hi,
>>>>>> 
>>>>>> What I wanted to address in my comment in 
>>>>>> CASSANDRA-8110(https://issues.apache.org/jira/browse/CASSANDRA-8110?focusedCommentId=17641705&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17641705)
>>>>>>  is to focus on better upgrade experience.
>>>>>> 
>>>>>> Upgrading the cluster can be painful for some orgs with mission critical 
>>>>>> Cassandra cluster, where they cannot tolerate less availability because 
>>>>>> of the inability to replace the downed node.
>>>>>> They also need to plan rolling back to the previous state when something 
>>>>>> happens along the way.
>>>>>> The change I proposed in CASSANDRA-8110 is to achieve the goal of at 
>>>>>> least enabling SSTable streaming during the upgrade by not upgrading the 
>>>>>> SSTable version. This can make the cluster to easily rollback to the 
>>>>>> previous version.
>>>>>> Downgrading SSTable is not the primary focus (though Cassandra needs to 
>>>>>> implement the way to write SSTable in older versions, so it is somewhat 
>>>>>> related.)
>>>>>> 
>>>>>> I'm preparing the design doc for the change.
>>>>>> Also, if I should create a separate ticket from CASSANDRA-8110 for the 
>>>>>> clarity of the goal of the change, please let me know.
>>>>>> 
>>>>>> 
>>>>>> On Tue, Feb 21, 2023 at 5:31 AM Benedict <bened...@apache.org 
>>>>>> <mailto:bened...@apache.org>> wrote:
>>>>>>> 
>>>>>>> FWIW I think 8110 is the right approach, even if it isn’t a panacea. We 
>>>>>>> will have to eventually also tackle system schema changes (probably not 
>>>>>>> hard), and may have to think a little carefully about other things, eg 
>>>>>>> with TTLs the format change is only the contract about what values can 
>>>>>>> be present, so we have to make sure the data validity checks are 
>>>>>>> consistent with the format we write. It isn’t as simple as writing an 
>>>>>>> earlier version in this case (unless we permit truncating the TTL, 
>>>>>>> perhaps) 
>>>>>>> 
>>>>>>> On 20 Feb 2023, at 20:24, Benedict <bened...@apache.org 
>>>>>>> <mailto:bened...@apache.org>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> In a self-organising community, things that aren’t self-policed 
>>>>>>>> naturally end up policed in an adhoc manner, and with difficulty. I’m 
>>>>>>>> not sure that’s the same as arbitrary enforcement. It seems to me the 
>>>>>>>> real issue is nobody noticed this was agreed and/or forgot and didn’t 
>>>>>>>> think about it much. 
>>>>>>>> 
>>>>>>>> But, even without any prior agreement, it’s perfectly reasonable to 
>>>>>>>> request that things do not break compatibility if they do not need to, 
>>>>>>>> as part of the normal patch integration process.
>>>>>>>> 
>>>>>>>> Issues with 3.1->4.0 aren’t particularly relevant as they predate any 
>>>>>>>> agreement to do this. But we can and should address the problem of new 
>>>>>>>> columns in schema tables, as this happens often between versions. I’m 
>>>>>>>> not sure it has in 4.1 though?
>>>>>>>> 
>>>>>>>> Regarding downgrade versions, surely this should simply be the same as 
>>>>>>>> upgrade versions we support?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 20 Feb 2023, at 20:02, Jeff Jirsa <jji...@gmail.com 
>>>>>>>>> <mailto:jji...@gmail.com>> wrote:
>>>>>>>>> 
>>>>>>>>> I'm not even convinced even 8110 addresses this - just writing 
>>>>>>>>> sstables in old versions won't help if we ever add things like new 
>>>>>>>>> types or new types of collections without other control abilities. 
>>>>>>>>> Claude's other email in another thread a few hours ago talks about 
>>>>>>>>> some of these surprises - "Specifically during the 3.1 -> 4.0 changes 
>>>>>>>>> a column broadcast_port was added to system/local.  This means that 
>>>>>>>>> 3.1 system can not read the table as it has no definition for it.  I 
>>>>>>>>> tried marking the column for deletion in the metadata and in the 
>>>>>>>>> serialization header.  The later got past the column not found 
>>>>>>>>> problem, but I suspect that it just means that data columns after 
>>>>>>>>> broadcast_port shifted and so incorrectly read." - this is a harder 
>>>>>>>>> problem to solve than just versioning sstables and network protocols. 
>>>>>>>>> 
>>>>>>>>> Stepping back a bit, we have downgrade ability listed as a goal, but 
>>>>>>>>> it's not (as far as I can tell) universally enforced, nor is it clear 
>>>>>>>>> at which point we will be able to concretely say "this release can be 
>>>>>>>>> downgraded to X".   Until we actually define and agree that this is a 
>>>>>>>>> real goal with a concrete version where downgrade-ability becomes 
>>>>>>>>> real, it feels like things are somewhat arbitrarily enforced, which 
>>>>>>>>> is probably very frustrating for people trying to commit work/tickets.
>>>>>>>>> 
>>>>>>>>> - Jeff
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Mon, Feb 20, 2023 at 11:48 AM Dinesh Joshi <djo...@apache.org 
>>>>>>>>> <mailto:djo...@apache.org>> wrote:
>>>>>>>>>> I’m a big fan of maintaining backward compatibility. Downgradability 
>>>>>>>>>> implies that we could potentially roll back an upgrade at any time. 
>>>>>>>>>> While I don’t think we need to retain the ability to downgrade in 
>>>>>>>>>> perpetuity it would be a good objective to maintain strict backward 
>>>>>>>>>> compatibility and therefore downgradability until a certain point. 
>>>>>>>>>> This would imply versioning metadata and extending it in such a way 
>>>>>>>>>> that prior version(s) could continue functioning. This can certainly 
>>>>>>>>>> be expensive to implement and might bloat on-disk storage. However, 
>>>>>>>>>> we could always offer an option for the operator to optimize the 
>>>>>>>>>> on-disk structures for the current version then we can rewrite them 
>>>>>>>>>> in the latest version. This optimizes the storage and opens up new 
>>>>>>>>>> functionality. This means new features that can work with old 
>>>>>>>>>> on-disk structures will be available while others that strictly 
>>>>>>>>>> require new versions of the data structures will be unavailable 
>>>>>>>>>> until the operator migrates to the new version. This migration IMO 
>>>>>>>>>> should be irreversible. Beyond this point the operator will lose the 
>>>>>>>>>> ability to downgrade which is ok.
>>>>>>>>>> 
>>>>>>>>>> Dinesh
>>>>>>>>>> 
>>>>>>>>>>> On Feb 20, 2023, at 10:40 AM, Jake Luciani <jak...@gmail.com 
>>>>>>>>>>> <mailto:jak...@gmail.com>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> There has been progress on 
>>>>>>>>>>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8928
>>>>>>>>>>> 
>>>>>>>>>>> Which is similar to what datastax does for DSE. Would this be an 
>>>>>>>>>>> acceptable solution?
>>>>>>>>>>> 
>>>>>>>>>>> Jake 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Feb 20, 2023 at 11:17 AM guo Maxwell <cclive1...@gmail.com 
>>>>>>>>>>> <mailto:cclive1...@gmail.com>> wrote:
>>>>>>>>>>>> It seems “An alternative solution is to implement/complete 
>>>>>>>>>>>> CASSANDRA-8110 
>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8110>” can give 
>>>>>>>>>>>> us more options if it is finished😉
>>>>>>>>>>>> 
>>>>>>>>>>>> Branimir Lambov <blam...@apache.org 
>>>>>>>>>>>> <mailto:blam...@apache.org>>于2023年2月20日 周一下午11:03写道:
>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There has been a discussion lately about changes to the sstable 
>>>>>>>>>>>>> format in the context of being able to abort a cluster upgrade, 
>>>>>>>>>>>>> and the fact that changes to sstables can prevent downgraded 
>>>>>>>>>>>>> nodes from reading any data written during their temporary 
>>>>>>>>>>>>> operation with the new version.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Most of the discussion is in CASSANDRA-18134 
>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-18134>, and is 
>>>>>>>>>>>>> spreading into CASSANDRA-14277 
>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-14227> and 
>>>>>>>>>>>>> CASSANDRA-17698 
>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17698>, none of 
>>>>>>>>>>>>> which is a good place to discuss the topic seriously.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Downgradability is a worthy goal and is listed in the current 
>>>>>>>>>>>>> roadmap. I would like to open a discussion here on how it would 
>>>>>>>>>>>>> be achieved.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> My understanding of what has been suggested so far translates to:
>>>>>>>>>>>>> - avoid changes to sstable formats;
>>>>>>>>>>>>> - if there are changes, implement them in a way that is 
>>>>>>>>>>>>> backwards-compatible, e.g. by duplicating data, so that a new 
>>>>>>>>>>>>> version is presented in a component or portion of a component 
>>>>>>>>>>>>> that legacy nodes will not try to read;
>>>>>>>>>>>>> - if the latter is not feasible, make sure the changes are only 
>>>>>>>>>>>>> applied if a feature flag has been enabled.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> To me this approach introduces several risks:
>>>>>>>>>>>>> - it bloats file and parsing complexity;
>>>>>>>>>>>>> - it discourages improvement (e.g. CASSANDRA-17698 is no longer a 
>>>>>>>>>>>>> LHF ticket once this requirement is in place);
>>>>>>>>>>>>> - it needs care to avoid risky solutions to address technical 
>>>>>>>>>>>>> issues with the format versioning (e.g. staying on n-versions for 
>>>>>>>>>>>>> 5.0 and needing a bump for a 4.1 bugfix might require porting 
>>>>>>>>>>>>> over support for new features);
>>>>>>>>>>>>> - it requires separate and uncoordinated solutions to the problem 
>>>>>>>>>>>>> and switching mechanisms for each individual change.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> An alternative solution is to implement/complete CASSANDRA-8110 
>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8110>, which 
>>>>>>>>>>>>> provides a method of writing sstables for a target version. 
>>>>>>>>>>>>> During upgrades, a node could be set to produce sstables 
>>>>>>>>>>>>> corresponding to the older version, and there is a very 
>>>>>>>>>>>>> straightforward way to implement modifications to formats like 
>>>>>>>>>>>>> the tickets above to conform to its requirements. 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What do people think should be the way forward?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Branimir
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> -- 
>>>>>>>>>>>> you are the apple of my eye !
>>>>>>>>>>> 
>>>>>>>>>>> -- 
>>>>>>>>>>> http://twitter.com/tjake
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Branimir Lambov
>>>> e. branimir.lam...@datastax.com <mailto:branimir.lam...@datastax.com>
>>>> w. www.datastax.com <http://www.datastax.com/>
>>>> 
>> 
>> 

Reply via email to