Hi Jon, It is a mixture of things really, firstly it is a legacy issue where there have been performance problems in the past during upgrades, these have now been fixed, but it is not easy to regain the trust in the process.
Secondly there are some very large clusters involved, 1300+ nodes across multiple physical datacenters, in this case any upgrades are only done out of hours and only one datacenter per day. So a normal upgrade cycle will take multiple weeks, and this one will take 3 times as long. This is a very large organisation with some very fixed rules and processes, so the Cassandra team does need to fit within these constraints and we have limited ability to influence any changes. But even forgetting these constraints, in a previous organisation ( 100+ clusters ) which had very good automation for this sort of thing, I can still see this process taking 3 times as long to complete as a normal upgrade, and this does take up operators time. I can see the advantages of 3 stage process, and all things being equal I would recommend that process as being safer, however I am getting a lot of push back whenever we discuss the upgrade process. Thanks Paul > On 17 Dec 2024, at 19:24, Jon Haddad <rustyrazorbl...@apache.org> wrote: > > Just curious, why is a rolling restart difficult? Is it a tooling issue, > stability, just overall fear of messing with things? > > You *should* be able to do a rolling restart without it being an issue. I > look at this as a fundamental workflow that every C* operator should have > available, and you should be able to do them without there being any concern. > > Jon > > > On 2024/12/17 16:01:06 Paul Chandler wrote: >> All, >> >> We are getting a lot of push back on the 3 stage process of going through >> the three compatibility modes to upgrade to Cassandra 5. This basically >> means 3 rolling restarts of a cluster, which will be difficult for some of >> our large multi DC clusters. >> >> Having researched this, it looks like, if you are not going to create large >> TTL’s, it would be possible to go straight from C*4 to C*5 with SCM NONE. >> This seems to be the same as it would have been going from 4.0 -> 4.1 >> >> Is there any reason why this should not be done? Has anyone had experience >> of upgrading in this way? >> >> Thanks >> >> Paul Chandler >> >>