Let me close this out by describing where I went afterwards.

I did a ceph orch apply with explicit --placement of 2 servers, one being the existing ceph08 and the other being the dell02 that was getting the "unknown service" errors.

This caused the dell02 machine to get a working prometheus and shut up about the error. The ceph08 machine reported an error. It claimed that apparently the prometheus port was already in use. As it was, since prometheus was already running. The deployer should properly have understood that and acted in a way that allowed the deployment to process without complaint.

To clear the port in use issue, I restarted prometheus on ceph08, then when it persisted did a "ceph mgr fail". That cleared all of the prometheus-related complaints and gave me a "HEALTH OK" status.

I wasn't brave enough to attempt the YAML version again, considering that's where the problem started. I also didn't attempt to try an "orch apply" that omitted any running service host, for fear it wouldn't remove the omitted host.

So the problem is fixed, but it took a lot of banging and hammering to make it work.

     Tim


On 3/28/25 19:12, Tim Holloway wrote:
Almost forgot to say. I switched out disks and got rid of the OSD errors. I actually found a third independent location, so it should be a lot more failure resistant now.

So now it's only the prometheus stuff that's still complaining. Everything else is happy.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to