On Wed, 18 Jun 2025, at 18:09, Eugen Block wrote:
> That does look strange indeed, either an upgrade went wrong or someone
> already fiddled with the monmap, I'd say. But anyway, I wouldn't try
> to deploy a 4th mon since it would want to sync the store, but we
> don't know in which state the store actually is. And besides from
> that, 2 out of 4 MONs still isn't a quorum, so there's no real
> benefit. So my best bet would be on the mon with the most recent
> store. And if the cluster comes back up with one mon, you'll need to
> wipe the traces of the previous mons so DeepSea can redeploy
> additional mons cleanly. Or is the cluster not managed by DeepSea
> anymore?
Replies to fragments from above are below:
> either an upgrade went wrong or someone already fiddled with the monmap
That's entirely possible. I'm playing the role of a "guy who knows a bit about
Ceph" to try and un-explode an old cluster on unsupported OS and hardware. The
original deployers are long-since gone and the day-to-day admins were never
given much handover. There are legends of several phases of upgrades and
deployment system replacements, but concrete documentation is thin on the
ground. Certainly I recently found evidence of failed OS upgrades that broke
part of the RGW services years ago.
I had previously documented a plan to migrate the cluster to a new/supported
hardware, OS and Ceph version, but the client was still thinking about it when
this happened.
> wouldn't try to deploy a 4th mon
The idea for the 4th MON was to just see if I can deploy a new MON without
breaking the cluster much more. However given I can't get >1 MONs to start,
it's pretty broken right now. If that deployment worked, I intended then to
remove/redeploy each of the other two MONs before retiring the 4th MON again. A
side-benefit of this is that it lets me test some of my cluster upgrade plan.
One of the cluster clients is Openstack, which in my experience is pretty
"sentimental" about its set of MON IPs.
> mon with the most recent store
How would I find out which MON that is? I'm told mon3 was the last one
operating (but it gives the wall of "e6 handle_auth_request failed to assign
global_id" logs when running). mon2 is the one that survives if you try to
start all of them. I've tried inspecting the (SQLite?) DBs, but can't get much
comprehensible info out of them yet (I don't have any experience tinkering with
SQLite, but I'm OK with an "actual" SQL repl). I can't get quorum, so I can't
run "ceph ..." command lines, but I can calk to each of the MONs on their Unix
sockets when they're running.
> Or is the cluster not managed by DeepSea anymore?
I don't think it is. None of the admins (nor I) have very deep experience in
Salt stuff (I'm more Ansible). The aforementioned "legends" of the system's
lifetime also say there were multiple different management systems over the
years. I've mainly used the existing Salt config to break-into managed nodes I
didn't yet have an account for and to do fleet-wide "shell command" operations.
Given the probability of historic broken OS upgrades and possibly abandoned
Salt management, I'd be wary of trying to use this for deployment automation.
(Now in a later email)
> Although I'm not a dev, I looked into the code [0] anyway.
>
> The comments before the maybe_resize_cluster function say:
>
> * If a cluster is undersized (with respect to max_mds), then
> * attempt to find daemons to grow it. If the cluster is oversized
> * (with respect to max_mds) then shrink it by stopping its highest rank.
>
> Is it possible that an operator/admin tried to resize (shrink or grow
> the number of MDS daemons) the MDS culster? Or was a DeepSea stage
> executed in order to deploy additional daemons? Maybe some history
> could help understand what might have happened."
Yes, I saw all that too. I was told that this all started because one of the
admins noticed that the CephFS service was slow and was reporting laggy MDSes.
This may well be a latent issue from possible historical failed upgrade (pure
guesses here). The admin tried restarting some daemons and eventually only mon2
would run (I'm a bit vague on the detail). I don't *think* they tried removing
the MDS daemons, but it's possible (I'll check tomorrow).
One of my possible plans of attack was to see if that "maybe resize..." method
might be skipped with some "No"-flag or other config. Hopefully then to try and
get quorum established before re-enabling it and possibly coming back to
health. This is probably too wishful a prospect, though.
Thanks again for all your feedback. Even if this just turns out to be a massive
"rubber ducking" session, you've given me some new ideas and threads to pull.
My main question now is "which is the 'latest' MON?"
M0les.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io