Re: Cassandra 5 Upgrade - Storage Compatibility Modes

Jon Haddad Wed, 18 Dec 2024 09:49:26 -0800

> We (Wikimedia) have had more (major) upgrades go wrong in some way, than
right.  Any significant upgrade is going to be weeks —if not months— in the
making, with careful testing, a phased rollout, and a workable plan for
rollback.  We'd never entertain doing more than one at a time, it's just
way too many moving parts.


The question wasn't about why upgrades are hard, it was about why a rolling
restart of the cluster is hard.  They're different things.

* Yes, upgrades should go through a rigorous qualification process.
* No, rolling restarts shouldn't be a major endeavor.

If an organization has thousands of Cassandra nodes, it should also have
tooling to perform rolling restarts of a cluster, either one node, multiple
nodes in a rack, or an entire rack at a tie.  I consider this fundamental
to operating Cassandra at scale.

I've worked with organizations that have had this dialed in well, and ones
that have done it by hand.  The ones that did 1K nodes by hand really hated
rolling restarts.  The ones that did it well didn't care at all because it
was behind automation, and we'd do it whenever we needed to, not just
during off hours.

Jon



On Wed, Dec 18, 2024 at 9:27 AM Eric Evans <john.eric.ev...@gmail.com>
wrote:

>
>
> On Tue, Dec 17, 2024 at 2:37 PM Paul Chandler <p...@redshots.com> wrote:
>
>> It is a mixture of things really, firstly it is a legacy issue where
>> there have been performance problems in the past during upgrades, these
>> have now been fixed, but it is not easy to regain the trust in the process.
>>
>> Secondly there are some very large clusters involved, 1300+ nodes across
>> multiple physical datacenters, in this case any upgrades are only done out
>> of hours and only one datacenter per day. So a normal upgrade cycle will
>> take multiple weeks, and this one will take 3 times as long.
>>
>> This is a very large organisation with some very fixed rules and
>> processes, so the Cassandra team does need to fit within these constraints
>> and we have limited ability to influence any changes.
>>
>
> I can second all of this.
>
> We (Wikimedia) have had more (major) upgrades go wrong in some way, than
> right.  Any significant upgrade is going to be weeks —if not months— in the
> making, with careful testing, a phased rollout, and a workable plan for
> rollback.  We'd never entertain doing more than one at a time, it's just
> way too many moving parts.
>
>
>> But even forgetting these constraints, in a previous organisation ( 100+
>> clusters ) which had very good automation for this sort of thing, I can
>> still see this process taking 3 times as long to complete as a normal
>> upgrade, and this does take up operators time.
>>
>> I can see the advantages of 3 stage process, and all things being equal I
>> would recommend that process as being safer, however I am getting a lot of
>> push back whenever we discuss the upgrade process.
>>
>> Thanks
>>
>> Paul
>>
>> > On 17 Dec 2024, at 19:24, Jon Haddad <rustyrazorbl...@apache.org>
>> wrote:
>> >
>> > Just curious, why is a rolling restart difficult?  Is it a tooling
>> issue, stability, just overall fear of messing with things?
>> >
>> > You *should* be able to do a rolling restart without it being an
>> issue.  I look at this as a fundamental workflow that every C* operator
>> should have available, and you should be able to do them without there
>> being any concern.
>> >
>> > Jon
>> >
>> >
>> > On 2024/12/17 16:01:06 Paul Chandler wrote:
>> >> All,
>> >>
>> >> We are getting a lot of push back on the 3 stage process of going
>> through the three compatibility modes to upgrade to Cassandra 5. This
>> basically means 3 rolling restarts of a cluster, which will be difficult
>> for some of our large multi DC clusters.
>> >>
>> >> Having researched this, it looks like, if you are not going to create
>> large TTL’s, it would be possible to go straight from C*4 to C*5 with SCM
>> NONE. This seems to be the same as it would have been going from 4.0 -> 4.1
>> >>
>> >> Is there any reason why this should not be done? Has anyone had
>> experience of upgrading in this way?
>>
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

Reply via email to