You could check Alertmanager container logs
<https://docs.ceph.com/en/quincy/cephadm/operations/#example-of-logging-to-journald>
.

Kind Regards,
Ernesto


On Wed, Apr 16, 2025 at 4:54 PM Tim Holloway <t...@mousetech.com> wrote:

> I'm thinking more some sort of latency error.
>
> I have 2 prometheus daemons running at the moment. The hosts files on
> all my ceph servers contain both hostname and FQDN.
>
> This morning the alert was gone. I don't know where I might find a log
> of when it comes and goes, but all was clean, then it wasn't, now it's
> clean again and I haven't been playing with any sort of configurations
> or bouncing hosts or services. It's just appearing and disappearing.
>
>     Tim
>
> On 4/16/25 09:34, Ankush Behl wrote:
> > Just to add upon what Ernesto mentioned. Your prometheus container
> > might not be able to reachout to ceph scrape job as the it could be using
> > FQDN or Hostname. Try updating /etc/hosts with ip and hostname of the
> > ceph scrape job(you can find it on prometheus UI -> status -> targets)
> and
> > restarting the prometheus after that might help resolve the issue.
> >
> > On Wed, Apr 16, 2025 at 2:10 PM Ernesto Puerta <epuer...@redhat.com>
> wrote:
> >
> >> Don't shoot the messenger. Dashboard is just displaying the alert that
> >> Prometheus/AlertManager is reporting. The alert definition is here
> >> <
> >>
> https://github.com/ceph/ceph/blob/3993779cde9d10512f4a26f87487d11103ac1bd0/monitoring/ceph-mixin/prometheus_alerts.yml#L342-L351
> >>> .
> >> As you may see, it's based on the status of the Prometheus "ceph" scrape
> >> job. This alert is vital, because if the "ceph" job is not scraping
> metrics
> >> from the "mgr/prometheus" module, no other Ceph alert condition will be
> >> detected, therefore creating a false sense of confidence.
> >>
> >> You may start having a look at Prometheus and/or Alertmanager web UIs,
> or
> >> checking their logs.
> >>
> >> Kind Regards,
> >> Ernesto
> >>
> >>
> >> On Tue, Apr 15, 2025 at 7:28 PM Tim Holloway <t...@mousetech.com>
> wrote:
> >>
> >>> Although I've had this problem since at least Pacific, I'm still seeing
> >>> it on Reef.
> >>>
> >>> After much pain and suffering (covered elsewhere), I got my Prometheus
> >>> services deployed as intended, Ceph health OK, green across the board.
> >>>
> >>> However, over the weekend, the dreaded
> >>> "CephMgrPrometheusModuleInactive" alert has returned to the Dashboard.
> >>> "The mgr/prometheus module at dell02.mousetech.com:9283 is
> >>> unreachable."
> >>>
> >>> It's a blatant lie.
> >>>
> >>> I still get "Ceph HEALTH_OK". All monitor status command show
> >>> everything running. Checking ports on the host says it's listening.
> >>>
> >>> More to the point, I can send my desktop browser to
> >>> http://dell02.mousetech.com:9283 and get a page that will allow me to
> >>> see the metrics. So everyone can see it but the Dashboard!
> >>>
> >>> I did have some issues when the other prometheus host couldn't resolve
> >>> the hostname, but I fixed that for all ceph hosts and it was green for
> >>> days. Now the error is back. Restarting Prometheus didn't help.
> >>>
> >>> How is the Dashboard hallucinating this???
> >>>
> >>>     Tim
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>>
> >>>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to