Let me close this out by describing where I went afterwards.
I did a ceph orch apply with explicit --placement of 2 servers, one
being the existing ceph08 and the other being the dell02 that was
getting the "unknown service" errors.
This caused the dell02 machine to get a working prometheus and shut up
about the error. The ceph08 machine reported an error. It claimed that
apparently the prometheus port was already in use. As it was, since
prometheus was already running. The deployer should properly have
understood that and acted in a way that allowed the deployment to
process without complaint.
To clear the port in use issue, I restarted prometheus on ceph08, then
when it persisted did a "ceph mgr fail". That cleared all of the
prometheus-related complaints and gave me a "HEALTH OK" status.
I wasn't brave enough to attempt the YAML version again, considering
that's where the problem started. I also didn't attempt to try an "orch
apply" that omitted any running service host, for fear it wouldn't
remove the omitted host.
So the problem is fixed, but it took a lot of banging and hammering to
make it work.
Tim
On 3/28/25 19:12, Tim Holloway wrote:
Almost forgot to say. I switched out disks and got rid of the OSD
errors. I actually found a third independent location, so it should be
a lot more failure resistant now.
So now it's only the prometheus stuff that's still complaining.
Everything else is happy.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io