Just to add upon what Ernesto mentioned. Your prometheus container might not be able to reachout to ceph scrape job as the it could be using FQDN or Hostname. Try updating /etc/hosts with ip and hostname of the ceph scrape job(you can find it on prometheus UI -> status -> targets) and restarting the prometheus after that might help resolve the issue.
On Wed, Apr 16, 2025 at 2:10 PM Ernesto Puerta <epuer...@redhat.com> wrote: > Don't shoot the messenger. Dashboard is just displaying the alert that > Prometheus/AlertManager is reporting. The alert definition is here > < > https://github.com/ceph/ceph/blob/3993779cde9d10512f4a26f87487d11103ac1bd0/monitoring/ceph-mixin/prometheus_alerts.yml#L342-L351 > >. > As you may see, it's based on the status of the Prometheus "ceph" scrape > job. This alert is vital, because if the "ceph" job is not scraping metrics > from the "mgr/prometheus" module, no other Ceph alert condition will be > detected, therefore creating a false sense of confidence. > > You may start having a look at Prometheus and/or Alertmanager web UIs, or > checking their logs. > > Kind Regards, > Ernesto > > > On Tue, Apr 15, 2025 at 7:28 PM Tim Holloway <t...@mousetech.com> wrote: > > > Although I've had this problem since at least Pacific, I'm still seeing > > it on Reef. > > > > After much pain and suffering (covered elsewhere), I got my Prometheus > > services deployed as intended, Ceph health OK, green across the board. > > > > However, over the weekend, the dreaded > > "CephMgrPrometheusModuleInactive" alert has returned to the Dashboard. > > "The mgr/prometheus module at dell02.mousetech.com:9283 is > > unreachable." > > > > It's a blatant lie. > > > > I still get "Ceph HEALTH_OK". All monitor status command show > > everything running. Checking ports on the host says it's listening. > > > > More to the point, I can send my desktop browser to > > http://dell02.mousetech.com:9283 and get a page that will allow me to > > see the metrics. So everyone can see it but the Dashboard! > > > > I did have some issues when the other prometheus host couldn't resolve > > the hostname, but I fixed that for all ceph hosts and it was green for > > days. Now the error is back. Restarting Prometheus didn't help. > > > > How is the Dashboard hallucinating this??? > > > > Tim > > _______________________________________________ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io