[ceph-users] Re: Prometheus anomaly in Reef

Eugen Block Wed, 26 Mar 2025 10:25:32 -0700

Ok, I'll try one last time and ask for cephadm.log output. ;-) And theactive MGR's log might help here as well.


Zitat von Tim Holloway <t...@mousetech.com>:

No change.

On 3/26/25 13:01, Tim Holloway wrote:
It's strange, but for a while I'd been trying to get prometheusworking on ceph08, so I don't know.
All I do know is immediately after editing the proxy settings I gotindications that those 2 OSDs had gone down.
What's REALLY strange is that their logs seem to hint that somehowthey shifted from administered to legacy configuration. That is,looking for OSD resources under /var/lib/ceph instead of/var/lib/ceph/<fsid>.
Anyway, I'll try yanking and re-deploying prometheus and maybe thatwill magically cure something.
On 3/26/25 12:53, Eugen Block wrote:
Right, systemctl edit works as well. But I'm confused about thedown OSDs. Did you set the proxy on all hosts? Because the downOSDs are on ceph06 while prometheus is supposed to run on dell02.Are you sure those are related?
I would recommend to remove the prometheus service entirely andstart from scratch:
ceph orch rm prometheus
ceph mgr module disable prometheus
ceph mgr fail

Wait a minute, then enable it again and deploy prometheus:

ceph orch apply -i prometheus.yaml
ceph mgr module enable prometheus



Zitat von Tim Holloway <t...@mousetech.com>:
Since the containers are all podman, I found a "systemctl editpodman" command that's recommended to set proxy for that.
However, once I did, 2 OSDs went down and cannot be restarted.
In any event, before I did that, ceph health detail was returning"HEALTH OK".
Now I'm getting this:
HEALTH_ERR 2 failed cephadm daemon(s); Module 'prometheus' hasfailed: gaierror(-2, 'Name or service not known'); too many PGsper OSD (865 > max 560)
[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
    daemon osd.3 on ceph06.internal.mousetech.com is in error state
    daemon osd.2 on ceph08.internal.mousetech.com is in error state
[ERR] MGR_MODULE_ERROR: Module 'prometheus' has failed:gaierror(-2, 'Name or service not known') Module 'prometheus' has failed: gaierror(-2, 'Name or servicenot known')
[WRN] TOO_MANY_PGS: too many PGs per OSD (865 > max 560)

On 3/26/25 12:07, Eugen Block wrote:
If you need a proxy to pull the images, I suggest to set it inthe containers.conf:
cat /etc/containers/containers.conf
[engine]
env = ["http_proxy=<host>:<port>", "https_proxy=<host>:<port>","no_proxy=<your_no_proxy_list>"]
But again, you should be able to see a failed to pull in thecephadm.log on dell02. Or even in 'ceph health detail', usuallyit warns you if the orchestrator failed to place a daemon.
Zitat von Tim Holloway <t...@mousetech.com>:
One thing I did run into when upgrading was TLS issues pullingimages. I had to set HTTP/S_PROXY and pull manually.
That may relate to this:
025-03-26T10:52:16.547985+0000 mgr.dell02.zwnrme (mgr.18015288)23874 : cephadm [INF] Saving service prometheus spec withplacement dell02.mousetech.com2025-03-26T10:52:16.560810+0000 mgr.dell02.zwnrme(mgr.18015288) 23875 : cephadm [INF] Saving servicenode-exporter spec with placement *2025-03-26T10:52:16.572380+0000 mgr.dell02.zwnrme(mgr.18015288) 23876 : cephadm [INF] Saving servicealertmanager spec with placement dell02.mousetech.com2025-03-26T10:52:16.583555+0000 mgr.dell02.zwnrme(mgr.18015288) 23878 : cephadm [INF] Saving service grafanaspec with placement dell02.mousetech.com2025-03-26T10:52:16.601713+0000 mgr.dell02.zwnrme(mgr.18015288) 23879 : cephadm [INF] Saving serviceceph-exporter spec with placement *2025-03-26T10:52:44.139886+0000 mgr.dell02.zwnrme(mgr.18015288) 23898 : cephadm [INF] Restart service mgr2025-03-26T10:53:02.720157+0000 mgr.ceph08.tlocfi(mgr.18043792) 7 : cephadm [INF] [26/Mar/2025:10:53:02] ENGINEBus STARTING2025-03-26T10:53:02.824138+0000 mgr.ceph08.tlocfi(mgr.18043792) 8 : cephadm [INF] [26/Mar/2025:10:53:02] ENGINEServing on http://10.0.1.58:87652025-03-26T10:53:02.962314+0000 mgr.ceph08.tlocfi(mgr.18043792) 9 : cephadm [INF] [26/Mar/2025:10:53:02] ENGINEServing on https://10.0.1.58:71502025-03-26T10:53:02.962805+0000 mgr.ceph08.tlocfi(mgr.18043792) 10 : cephadm [INF] [26/Mar/2025:10:53:02] ENGINEBus STARTED2025-03-26T10:53:02.964966+0000 mgr.ceph08.tlocfi(mgr.18043792) 11 : cephadm [ERR] [26/Mar/2025:10:53:02] ENGINEError in HTTPServer.serve
Traceback (most recent call last):
File "/lib/python3.9/site-packages/cheroot/server.py", line1823, in serve
    self._connections.run(self.expiration_interval)
File "/lib/python3.9/site-packages/cheroot/connections.py",line 203, in run
    self._run(expiration_interval)
File "/lib/python3.9/site-packages/cheroot/connections.py",line 246, in _run
    new_conn = self._from_server_socket(self.server.socket)
File "/lib/python3.9/site-packages/cheroot/connections.py",line 300, in _from_server_socket
    s, ssl_env = self.server.ssl_adapter.wrap(s)
File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py",line 277, in wrap
    s = self.context.wrap_socket(
  File "/lib64/python3.9/ssl.py", line 501, in wrap_socket
    return self.sslsocket_class._create(
  File "/lib64/python3.9/ssl.py", line 1074, in _create
    self.do_handshake()
  File "/lib64/python3.9/ssl.py", line 1343, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLZeroReturnError: TLS/SSL connection has been closed(EOF) (_ssl.c:1133)
2025-03-26T10:53:03.471114+0000 mgr.ceph08.tlocfi(mgr.18043792) 12 : cephadm [INF] Updatingceph03.internal.mousetech.com:/etc/ceph/ceph.conf
On 3/26/25 11:39, Eugen Block wrote:
Then maybe the deployment did fail and we’re back at lookinginto the cephadm.log.
Zitat von Tim Holloway <t...@mousetech.com>:
it returns nothing. I'd already done the same via "systemctl| grep prometheus". There simply isn't a systemd service,even though there should be.
On 3/26/25 11:31, Eugen Block wrote:
There’s a service called „prometheus“, which can havemultiple daemons, just like any other service (mon, mgretc). To get the daemon logs you need to provide the daemonname (prometheus.ceph02.andsopn), not just the service name(prometheus).
Can you run the cephadm command I provided? It should showsomething like I pasted in the previous message.
Zitat von Tim Holloway <t...@mousetech.com>:
service_type: prometheus
service_name: prometheus
placement:
  hosts:
  - dell02.mousetech.com
networks:
- 10.0.1.0/24
Can't list daemon logs, run restart usw., because "ErrorEINVAL: No daemons exist under service name "prometheus".View currently running services using "ceph orch ls""
And yet, ceph orch ls shows prometheus as a service.

On 3/26/25 11:13, Eugen Block wrote:
ceph orch ls prometheus --export
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Prometheus anomaly in Reef

Reply via email to