On Wed, Dec 18, 2024 at 12:12 PM Jon Haddad <j...@rustyrazorblade.com> wrote:

> I think we're talking about different things.
>
> >  Yes, and Paul clarified that it wasn't (just) an issue of having to do
> rolling restarts, but the work involved in doing an upgrade.  Were it only
> the case that the hardest part of doing an upgrade was the rolling
> restart...
>
> From several messages ago:
>
> > This basically means 3 rolling restarts of a cluster, which will be
> difficult for some of our large multi DC clusters.
>
> The discussion was specifically about rolling restarts and how storage
> compatibility mode requires them, which in this environment was described
> as difficult.  The difficultly of rest of the process is irrelevant here,
> because it's the same regardless of how you approach storage compatibility
> mode.  My point is that rolling restarts should not be difficult if you
> have the right automation, which you seem to agree with.
>
> Want to discuss the difficulty of upgrading in general?  I'm all for
> improving it.  It's just not what this thread is about.
>

You're right, I'm at least partly conflating other (recent) dev threads
about upgrade trajectories, sorry about that.  It still reads to me though
as an issue of change management (vis-a-vis what's happening that has us
restarting) versus the mechanics of rolling restarts, and that was what I
was alluding to.  If it *is* strictly about rolling restart logistics, I am
a) surprised (I didn't know this was a problem for anyone), and b) will sit
quietly now and try to understand why that is. :)


> On Wed, Dec 18, 2024 at 10:01 AM Eric Evans <john.eric.ev...@gmail.com>
> wrote:
>
>>
>>
>> On Wed, Dec 18, 2024 at 11:43 AM Jon Haddad <j...@rustyrazorblade.com>
>> wrote:
>>
>>> > We (Wikimedia) have had more (major) upgrades go wrong in some way,
>>> than right.  Any significant upgrade is going to be weeks —if not months—
>>> in the making, with careful testing, a phased rollout, and a workable plan
>>> for rollback.  We'd never entertain doing more than one at a time, it's
>>> just way too many moving parts.
>>>
>>> The question wasn't about why upgrades are hard, it was about why a
>>> rolling restart of the cluster is hard.  They're different things.
>>>
>>
>> Yes, and Paul clarified that it wasn't (just) an issue of having to do
>> rolling restarts, but the work involved in doing an upgrade.  Were it only
>> the case that the hardest part of doing an upgrade was the rolling
>> restart...
>>
>> --
>> Eric Evans
>> john.eric.ev...@gmail.com
>>
>

-- 
Eric Evans
john.eric.ev...@gmail.com

Reply via email to