> Secondly there are some very large clusters involved, 1300+ nodes across multiple physical datacenters, in this case any upgrades are only done out of hours and only one datacenter per day. So a normal upgrade cycle will take multiple weeks, and this one will take 3 times as long.
If you only restart one machine at a time, then yes, this will take a while. It's better with these environments to restart an entire rack at a once. This should significantly cut down on the time it takes to restart a cluster. This is how all large orgs I've worked in roll out big changes. Regardless, it might be possible to make the compatibility mode something that can be changed without a restart, through JMX. While it would solve your immediate problem by avoiding it, I'd strive to solve the underlying problem that your org is running Cassandra with unnecessary limitations practices that make your life harder. Jon On Tue, Dec 17, 2024 at 12:37 PM Paul Chandler <p...@redshots.com> wrote: > Hi Jon, > > It is a mixture of things really, firstly it is a legacy issue where there > have been performance problems in the past during upgrades, these have now > been fixed, but it is not easy to regain the trust in the process. > > Secondly there are some very large clusters involved, 1300+ nodes across > multiple physical datacenters, in this case any upgrades are only done out > of hours and only one datacenter per day. So a normal upgrade cycle will > take multiple weeks, and this one will take 3 times as long. > > This is a very large organisation with some very fixed rules and > processes, so the Cassandra team does need to fit within these constraints > and we have limited ability to influence any changes. > > But even forgetting these constraints, in a previous organisation ( 100+ > clusters ) which had very good automation for this sort of thing, I can > still see this process taking 3 times as long to complete as a normal > upgrade, and this does take up operators time. > > I can see the advantages of 3 stage process, and all things being equal I > would recommend that process as being safer, however I am getting a lot of > push back whenever we discuss the upgrade process. > > Thanks > > Paul > > > On 17 Dec 2024, at 19:24, Jon Haddad <rustyrazorbl...@apache.org> wrote: > > > > Just curious, why is a rolling restart difficult? Is it a tooling > issue, stability, just overall fear of messing with things? > > > > You *should* be able to do a rolling restart without it being an issue. > I look at this as a fundamental workflow that every C* operator should have > available, and you should be able to do them without there being any > concern. > > > > Jon > > > > > > On 2024/12/17 16:01:06 Paul Chandler wrote: > >> All, > >> > >> We are getting a lot of push back on the 3 stage process of going > through the three compatibility modes to upgrade to Cassandra 5. This > basically means 3 rolling restarts of a cluster, which will be difficult > for some of our large multi DC clusters. > >> > >> Having researched this, it looks like, if you are not going to create > large TTL’s, it would be possible to go straight from C*4 to C*5 with SCM > NONE. This seems to be the same as it would have been going from 4.0 -> 4.1 > >> > >> Is there any reason why this should not be done? Has anyone had > experience of upgrading in this way? > >> > >> Thanks > >> > >> Paul Chandler > >> > >> > >