OK, it seems like I didn’t explain it too well, but yes it is the rolling 
restart 3 times as part of the upgrade that is causing the push back, my 
message was a bit vague on the use cases because there are confidentiality 
agreements in place so I can’t share too much.

We have had problems in the past with rolling restarts, this was on the very 
large cluster, and from what I remember, when a node restarts, it was at under 
huge load for a while, this was due to the large number of gossip messages 
accumulated from all the other nodes, and there were a large number of clients 
trying to connect and the bcrypt ( struggling to remember if this the correct 
name) hashing was taking a lot of processing, this meant that the first clients 
to connect where then having very high latencies while the rest of the 
connections where processed.

This is all old history and has been fixed, so is not really what the question 
was about, however these old problems have a bad legacy in the memory of the 
people that matter. Hence the push back we have now.

I would like to thank Jeff for pointing out that my pain could be legitimate, 
but I would also like to thank everyone else for answering too.

I have tried Jon’s suggestion of accessing the SCM through JMX/nodetool, I have 
managed to get the setting changed while the node is up, so removing the need 
for a reboot. However the sstable format is only configured on startup, so it 
continuers to write nb* sstables. This is not really a subject for this list, 
so I will follow Scott’s suggestion and create an email to discuss this on the 
dev list. I will do that tomorrow.

I think the answer to my original question is that no, nobody has gone straight 
from C*4 to C*5 ( NONE ) and it is not recommended.

Thanks everyone

Paul

 

 

> On 18 Dec 2024, at 18:45, Eric Evans <john.eric.ev...@gmail.com> wrote:
> 
> 
> 
> On Wed, Dec 18, 2024 at 12:12 PM Jon Haddad <j...@rustyrazorblade.com 
> <mailto:j...@rustyrazorblade.com>> wrote:
> I think we're talking about different things.  
> 
> >  Yes, and Paul clarified that it wasn't (just) an issue of having to do 
> > rolling restarts, but the work involved in doing an upgrade.  Were it only 
> > the case that the hardest part of doing an upgrade was the rolling 
> > restart...
> 
> From several messages ago:
> 
> > This basically means 3 rolling restarts of a cluster, which will be 
> > difficult for some of our large multi DC clusters.
> 
> The discussion was specifically about rolling restarts and how storage 
> compatibility mode requires them, which in this environment was described as 
> difficult.  The difficultly of rest of the process is irrelevant here, 
> because it's the same regardless of how you approach storage compatibility 
> mode.  My point is that rolling restarts should not be difficult if you have 
> the right automation, which you seem to agree with.
> 
> Want to discuss the difficulty of upgrading in general?  I'm all for 
> improving it.  It's just not what this thread is about.
> 
> You're right, I'm at least partly conflating other (recent) dev threads about 
> upgrade trajectories, sorry about that.  It still reads to me though as an 
> issue of change management (vis-a-vis what's happening that has us 
> restarting) versus the mechanics of rolling restarts, and that was what I was 
> alluding to.  If it is strictly about rolling restart logistics, I am a) 
> surprised (I didn't know this was a problem for anyone), and b) will sit 
> quietly now and try to understand why that is. :)
>  
> On Wed, Dec 18, 2024 at 10:01 AM Eric Evans <john.eric.ev...@gmail.com 
> <mailto:john.eric.ev...@gmail.com>> wrote:
> 
> 
> On Wed, Dec 18, 2024 at 11:43 AM Jon Haddad <j...@rustyrazorblade.com 
> <mailto:j...@rustyrazorblade.com>> wrote:
> > We (Wikimedia) have had more (major) upgrades go wrong in some way, than 
> > right.  Any significant upgrade is going to be weeks —if not months— in the 
> > making, with careful testing, a phased rollout, and a workable plan for 
> > rollback.  We'd never entertain doing more than one at a time, it's just 
> > way too many moving parts.
> 
> The question wasn't about why upgrades are hard, it was about why a rolling 
> restart of the cluster is hard.  They're different things.
> 
> Yes, and Paul clarified that it wasn't (just) an issue of having to do 
> rolling restarts, but the work involved in doing an upgrade.  Were it only 
> the case that the hardest part of doing an upgrade was the rolling restart...
> 
> -- 
> Eric Evans
> john.eric.ev...@gmail.com <mailto:john.eric.ev...@gmail.com>
> 
> 
> -- 
> Eric Evans
> john.eric.ev...@gmail.com <mailto:john.eric.ev...@gmail.com>

Reply via email to