[ceph-users] Re: Ceph 14.2.22 MONs keep crashing in MDSMonitor::maybe_resize_cluster (out of range)

Miles Goodhew Wed, 18 Jun 2025 06:50:17 -0700

On Wed, 18 Jun 2025, at 18:09, Eugen Block wrote:
> That does look strange indeed, either an upgrade went wrong or someone  
> already fiddled with the monmap, I'd say. But anyway, I wouldn't try  
> to deploy a 4th mon since it would want to sync the store, but we  
> don't know in which state the store actually is. And besides from  
> that, 2 out of 4 MONs still isn't a quorum, so there's no real  
> benefit. So my best bet would be on the mon with the most recent  
> store. And if the cluster comes back up with one mon, you'll need to  
> wipe the traces of the previous mons so DeepSea can redeploy  
> additional mons cleanly. Or is the cluster not managed by DeepSea  
> anymore?

Replies to fragments from above are below:

> either an upgrade went wrong or someone already fiddled with the monmap

That's entirely possible. I'm playing the role of a "guy who knows a bit about 
Ceph" to try and un-explode an old cluster on unsupported OS and hardware. The 
original deployers are long-since gone and the day-to-day admins were never 
given much handover. There are legends of several phases of upgrades and 
deployment system replacements, but concrete documentation is thin on the 
ground. Certainly I recently found evidence of failed OS upgrades that broke 
part of the RGW services years ago.

I had previously documented a plan to migrate the cluster to a new/supported  
hardware, OS and Ceph version, but the client was still thinking about it when 
this happened.


> wouldn't try to deploy a 4th mon
The idea for the 4th MON was to just see if I can deploy a new MON without 
breaking the cluster much more. However given I can't get >1 MONs to start, 
it's pretty broken right now. If that deployment worked, I intended then to 
remove/redeploy each of the other two MONs before retiring the 4th MON again. A 
side-benefit of this is that it lets me test some of my cluster upgrade plan. 
One of the cluster clients is Openstack, which in my experience is pretty 
"sentimental" about its set of MON IPs.


> mon with the most recent store
How would I find out which MON that is? I'm told mon3 was the last one 
operating (but it gives the wall of "e6 handle_auth_request failed to assign 
global_id" logs when running). mon2 is the one that survives if you try to 
start all of them. I've tried inspecting the (SQLite?) DBs, but can't get much 
comprehensible info out of them yet (I don't have any experience tinkering with 
SQLite, but I'm OK with an "actual" SQL repl). I can't get quorum, so I can't 
run "ceph ..." command lines, but I can calk to each of the MONs on their Unix 
sockets when they're running.


> Or is the cluster not managed by DeepSea anymore?
I don't think it is. None of the admins (nor I) have very deep experience in 
Salt stuff (I'm more Ansible). The aforementioned "legends" of the system's 
lifetime also say there were multiple different management systems over the 
years. I've mainly used the existing Salt config to break-into managed nodes I 
didn't yet have an account for and to do fleet-wide "shell command" operations. 
Given the probability of historic broken OS upgrades and possibly abandoned 
Salt management, I'd be wary of trying to use this for deployment automation.


(Now in a later email)
> Although I'm not a dev, I looked into the code [0] anyway.
> 
> The comments before the maybe_resize_cluster function say:
> 
>   * If a cluster is undersized (with respect to max_mds), then
>   * attempt to find daemons to grow it. If the cluster is oversized
>   * (with respect to max_mds) then shrink it by stopping its highest rank.
> 
> Is it possible that an operator/admin tried to resize (shrink or grow  
> the number of MDS daemons) the MDS culster? Or was a DeepSea stage  
> executed in order to deploy additional daemons? Maybe some history  
> could help understand what might have happened."

Yes, I saw all that too. I was told that this all started because one of the 
admins noticed that the CephFS service was slow and was reporting laggy MDSes. 
This may well be a latent issue from possible historical failed upgrade (pure 
guesses here). The admin tried restarting some daemons and eventually only mon2 
would run (I'm a bit vague on the detail). I don't *think* they tried removing 
the MDS daemons, but it's possible (I'll check tomorrow).

One of my possible plans of attack was to see if that "maybe resize..." method 
might be skipped with some "No"-flag or other config. Hopefully then to try and 
get quorum established before re-enabling it and possibly coming back to 
health. This is probably too wishful a prospect, though.

Thanks again for all your feedback. Even if this just turns out to be a massive 
"rubber ducking" session, you've given me some new ideas and threads to pull. 
My main question now is "which is the 'latest' MON?"

M0les.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph 14.2.22 MONs keep crashing in MDSMonitor::maybe_resize_cluster (out of range)

Reply via email to