[ceph-users] Re: Prometheus anomaly in Reef

Tim Holloway Mon, 31 Mar 2025 15:25:21 -0700

Let me close this out by describing where I went afterwards.

I did a ceph orch apply with explicit --placement of 2 servers, onebeing the existing ceph08 and the other being the dell02 that wasgetting the "unknown service" errors.

This caused the dell02 machine to get a working prometheus and shut upabout the error. The ceph08 machine reported an error. It claimed thatapparently the prometheus port was already in use. As it was, sinceprometheus was already running. The deployer should properly haveunderstood that and acted in a way that allowed the deployment toprocess without complaint.

To clear the port in use issue, I restarted prometheus on ceph08, thenwhen it persisted did a "ceph mgr fail". That cleared all of theprometheus-related complaints and gave me a "HEALTH OK" status.

I wasn't brave enough to attempt the YAML version again, consideringthat's where the problem started. I also didn't attempt to try an "orchapply" that omitted any running service host, for fear it wouldn'tremove the omitted host.

So the problem is fixed, but it took a lot of banging and hammering tomake it work.


     Tim


On 3/28/25 19:12, Tim Holloway wrote:

Almost forgot to say. I switched out disks and got rid of the OSDerrors. I actually found a third independent location, so it should bea lot more failure resistant now.
So now it's only the prometheus stuff that's still complaining.Everything else is happy.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Prometheus anomaly in Reef

Reply via email to