> My expectation is that in trunk SCM CASSANDRA_4 would change to SCM > CASSANDRA_5.
Assuming you upgrade from 4.0 to 5.0, then you are running on CASSANDRA_4… how many people know that they are expected to do something about that (Sam documented the steps earlier)? What if you leave things alone and try to upgrade to 5.1/6.0… now what? What about users who create a new 5.0 cluster… we still default to compatibility mode in this case, so a new 5.0 cluster is running with CASSANDRA_4… > This is why I want to remove the coupling between SCM and messaging version. Feels like we just had a similar conversation Sam with regard to TCM / Accord ;) I don’t see messaging version as the problem, as I feel that messaging and disk versions are intertwined and cause this confusion (they are the same serializers)… If we are running with messaging version VERSION_40, why does it matter if we write to disk with VERSION_40 or VERSION_50? If we want downgrade we should block _50 and only use _40 for disk, but why should networking not be allowed to do _50? What we write to disk impacts our ability to downgrade, and messaging already has an ability to downgrade its version if its peers don’t know the latest version. In short I agree with you Sam, we should decouple… I think it makes sense for SCM to control the version we use for disk, but not networking… > On Dec 12, 2024, at 8:46 AM, Sam Tunnicliffe <s...@beobal.com> wrote: > > No, we initially tried to preserve all the previous paths and put the whole > thing behind a feature flag, but it was just way too pervasive and doing so > would've added years to the project. So for the period before the CMS is > initialized, certain operations are not available. > > However, it should be entirely possible to downgrade and rollback to 5.0 > after cutting over to TCM, as long as SSTables are still in the old format. > By "should be" I mean it is absolutely possible and has been tested, but it > requires the SCM to guard the on disk format, which has the unfortunate > effect of limiting the messaging version and that in turn make it impossible > to actually cut over to TCM. i.e. the testing has been done with a patch > which disables some things which rely on messaging VERSION_51. This is why I > want to remove the coupling between SCM and messaging version. > > Also, I misspoke slightly in my previous email because I forgot that we did > manage to enable a decent subsection of TCM to work with > VERSION_40/VERSION_50. In this scenario, you still get the linearized schema > updates via the metadata log but replicas/coordinators don't exchange epochs > during reads/writes so the consistency guarantees are weakened. > > Thanks, > Sam > > >> On 12 Dec 2024, at 16:17, Jeremiah Jordan <jeremiah.jor...@gmail.com> wrote: >> >> My expectation is that in trunk SCM CASSANDRA_4 would change to SCM >> CASSANDRA_5. I think we should be striving to support full >> downgrade/rollback ability to the previous major version from trunk. >> With TCM I would expect that when running in CASSANDRA_5 mode that >> initializing TCM would not be possible, as once initialized you could no >> longer roll back. >> Do we have no way to support the gossip paths continuing to work prior to >> initializing TCM? >> >> -Jeremiah >> >> On Dec 11, 2024 at 7:41:48 AM, Sam Tunnicliffe <s...@beobal.com> wrote: >>> My point is that the upgrade to 5.1/6.0 isn't really complete until the CMS >>> is initialised and this can't be done while running with SCM CASSANDRA_4 >>> because of the messaging service limitation. Until that point, schema >>> changes & node replacements are not supported which affects how long a bake >>> time is tolerable. >>> This specific issue could probably be fixed by revisiting the SCM >>> implementation in 5.1/6.0, so we should certainly do that but the fact >>> remains that we don't have great test coverage to indicate how clusters >>> behave when running in SCM for a prolonged period. >>> >>> Thanks, >>> Sam >>> >>>> On 11 Dec 2024, at 13:29, Brandon Williams <dri...@gmail.com> wrote: >>>> >>>> On Wed, Dec 11, 2024 at 7:22 AM Sam Tunnicliffe <s...@beobal.com> wrote: >>>>> >>>>> so running in any SCM mode for a prolonged period is not really viable. >>>> >>>> This is what many users want to do though, upgrade one DC and let it >>>> bake to see how it goes before continuing. I don't think that's >>>> unreasonable, but from working on CASSANDRA-20118 I know how difficult >>>> that is already. I don't think we've built enough SCM muscle yet to >>>> think about handling multiple previous versions. >>>> >>>> Kind Regards, >>>> Brandon >>> >