You could check Alertmanager container logs <https://docs.ceph.com/en/quincy/cephadm/operations/#example-of-logging-to-journald> .
Kind Regards, Ernesto On Wed, Apr 16, 2025 at 4:54 PM Tim Holloway <t...@mousetech.com> wrote: > I'm thinking more some sort of latency error. > > I have 2 prometheus daemons running at the moment. The hosts files on > all my ceph servers contain both hostname and FQDN. > > This morning the alert was gone. I don't know where I might find a log > of when it comes and goes, but all was clean, then it wasn't, now it's > clean again and I haven't been playing with any sort of configurations > or bouncing hosts or services. It's just appearing and disappearing. > > Tim > > On 4/16/25 09:34, Ankush Behl wrote: > > Just to add upon what Ernesto mentioned. Your prometheus container > > might not be able to reachout to ceph scrape job as the it could be using > > FQDN or Hostname. Try updating /etc/hosts with ip and hostname of the > > ceph scrape job(you can find it on prometheus UI -> status -> targets) > and > > restarting the prometheus after that might help resolve the issue. > > > > On Wed, Apr 16, 2025 at 2:10 PM Ernesto Puerta <epuer...@redhat.com> > wrote: > > > >> Don't shoot the messenger. Dashboard is just displaying the alert that > >> Prometheus/AlertManager is reporting. The alert definition is here > >> < > >> > https://github.com/ceph/ceph/blob/3993779cde9d10512f4a26f87487d11103ac1bd0/monitoring/ceph-mixin/prometheus_alerts.yml#L342-L351 > >>> . > >> As you may see, it's based on the status of the Prometheus "ceph" scrape > >> job. This alert is vital, because if the "ceph" job is not scraping > metrics > >> from the "mgr/prometheus" module, no other Ceph alert condition will be > >> detected, therefore creating a false sense of confidence. > >> > >> You may start having a look at Prometheus and/or Alertmanager web UIs, > or > >> checking their logs. > >> > >> Kind Regards, > >> Ernesto > >> > >> > >> On Tue, Apr 15, 2025 at 7:28 PM Tim Holloway <t...@mousetech.com> > wrote: > >> > >>> Although I've had this problem since at least Pacific, I'm still seeing > >>> it on Reef. > >>> > >>> After much pain and suffering (covered elsewhere), I got my Prometheus > >>> services deployed as intended, Ceph health OK, green across the board. > >>> > >>> However, over the weekend, the dreaded > >>> "CephMgrPrometheusModuleInactive" alert has returned to the Dashboard. > >>> "The mgr/prometheus module at dell02.mousetech.com:9283 is > >>> unreachable." > >>> > >>> It's a blatant lie. > >>> > >>> I still get "Ceph HEALTH_OK". All monitor status command show > >>> everything running. Checking ports on the host says it's listening. > >>> > >>> More to the point, I can send my desktop browser to > >>> http://dell02.mousetech.com:9283 and get a page that will allow me to > >>> see the metrics. So everyone can see it but the Dashboard! > >>> > >>> I did have some issues when the other prometheus host couldn't resolve > >>> the hostname, but I fixed that for all ceph hosts and it was green for > >>> days. Now the error is back. Restarting Prometheus didn't help. > >>> > >>> How is the Dashboard hallucinating this??? > >>> > >>> Tim > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@ceph.io > >>> To unsubscribe send an email to ceph-users-le...@ceph.io > >>> > >>> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > >> > > _______________________________________________ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io