[ceph-users] Re: Prometheus anomaly in Reef

Tim Holloway Wed, 26 Mar 2025 09:50:08 -0700

Since the containers are all podman, I found a "systemctl edit podman"command that's recommended to set proxy for that.


However, once I did, 2 OSDs went down and cannot be restarted.

In any event, before I did that, ceph health detail was returning"HEALTH OK".


Now I'm getting this:

HEALTH_ERR 2 failed cephadm daemon(s); Module 'prometheus' has failed:gaierror(-2, 'Name or service not known'); too many PGs per OSD (865 >max 560)

[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
    daemon osd.3 on ceph06.internal.mousetech.com is in error state
    daemon osd.2 on ceph08.internal.mousetech.com is in error state

[ERR] MGR_MODULE_ERROR: Module 'prometheus' has failed: gaierror(-2,'Name or service not known') Module 'prometheus' has failed: gaierror(-2, 'Name or service notknown')

[WRN] TOO_MANY_PGS: too many PGs per OSD (865 > max 560)

On 3/26/25 12:07, Eugen Block wrote:

If you need a proxy to pull the images, I suggest to set it in thecontainers.conf:
cat /etc/containers/containers.conf
[engine]
env = ["http_proxy=<host>:<port>", "https_proxy=<host>:<port>","no_proxy=<your_no_proxy_list>"]
But again, you should be able to see a failed to pull in thecephadm.log on dell02. Or even in 'ceph health detail', usually itwarns you if the orchestrator failed to place a daemon.
Zitat von Tim Holloway <t...@mousetech.com>:
One thing I did run into when upgrading was TLS issues pullingimages. I had to set HTTP/S_PROXY and pull manually.
That may relate to this:
025-03-26T10:52:16.547985+0000 mgr.dell02.zwnrme (mgr.18015288) 23874: cephadm [INF] Saving service prometheus spec with placementdell02.mousetech.com2025-03-26T10:52:16.560810+0000 mgr.dell02.zwnrme (mgr.18015288)23875 : cephadm [INF] Saving service node-exporter spec with placement *2025-03-26T10:52:16.572380+0000 mgr.dell02.zwnrme (mgr.18015288)23876 : cephadm [INF] Saving service alertmanager spec with placementdell02.mousetech.com2025-03-26T10:52:16.583555+0000 mgr.dell02.zwnrme (mgr.18015288)23878 : cephadm [INF] Saving service grafana spec with placementdell02.mousetech.com2025-03-26T10:52:16.601713+0000 mgr.dell02.zwnrme (mgr.18015288)23879 : cephadm [INF] Saving service ceph-exporter spec with placement *2025-03-26T10:52:44.139886+0000 mgr.dell02.zwnrme (mgr.18015288)23898 : cephadm [INF] Restart service mgr2025-03-26T10:53:02.720157+0000 mgr.ceph08.tlocfi (mgr.18043792) 7 :cephadm [INF] [26/Mar/2025:10:53:02] ENGINE Bus STARTING2025-03-26T10:53:02.824138+0000 mgr.ceph08.tlocfi (mgr.18043792) 8 :cephadm [INF] [26/Mar/2025:10:53:02] ENGINE Serving onhttp://10.0.1.58:87652025-03-26T10:53:02.962314+0000 mgr.ceph08.tlocfi (mgr.18043792) 9 :cephadm [INF] [26/Mar/2025:10:53:02] ENGINE Serving onhttps://10.0.1.58:71502025-03-26T10:53:02.962805+0000 mgr.ceph08.tlocfi (mgr.18043792) 10 :cephadm [INF] [26/Mar/2025:10:53:02] ENGINE Bus STARTED2025-03-26T10:53:02.964966+0000 mgr.ceph08.tlocfi (mgr.18043792) 11 :cephadm [ERR] [26/Mar/2025:10:53:02] ENGINE Error in HTTPServer.serve
Traceback (most recent call last):
File "/lib/python3.9/site-packages/cheroot/server.py", line 1823,in serve
    self._connections.run(self.expiration_interval)
File "/lib/python3.9/site-packages/cheroot/connections.py", line203, in run
    self._run(expiration_interval)
File "/lib/python3.9/site-packages/cheroot/connections.py", line246, in _run
    new_conn = self._from_server_socket(self.server.socket)
File "/lib/python3.9/site-packages/cheroot/connections.py", line300, in _from_server_socket
    s, ssl_env = self.server.ssl_adapter.wrap(s)
File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line277, in wrap
    s = self.context.wrap_socket(
  File "/lib64/python3.9/ssl.py", line 501, in wrap_socket
    return self.sslsocket_class._create(
  File "/lib64/python3.9/ssl.py", line 1074, in _create
    self.do_handshake()
  File "/lib64/python3.9/ssl.py", line 1343, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF)(_ssl.c:1133)
2025-03-26T10:53:03.471114+0000 mgr.ceph08.tlocfi (mgr.18043792) 12 :cephadm [INF] Updating ceph03.internal.mousetech.com:/etc/ceph/ceph.conf
On 3/26/25 11:39, Eugen Block wrote:
Then maybe the deployment did fail and we’re back at looking intothe cephadm.log.
Zitat von Tim Holloway <t...@mousetech.com>:
it returns nothing. I'd already done the same via "systemctl | grepprometheus". There simply isn't a systemd service, even thoughthere should be.
On 3/26/25 11:31, Eugen Block wrote:
There’s a service called „prometheus“, which can have multipledaemons, just like any other service (mon, mgr etc). To get thedaemon logs you need to provide the daemon name(prometheus.ceph02.andsopn), not just the service name (prometheus).
Can you run the cephadm command I provided? It should showsomething like I pasted in the previous message.
Zitat von Tim Holloway <t...@mousetech.com>:
service_type: prometheus
service_name: prometheus
placement:
  hosts:
  - dell02.mousetech.com
networks:
- 10.0.1.0/24
Can't list daemon logs, run restart usw., because "Error EINVAL:No daemons exist under service name "prometheus". View currentlyrunning services using "ceph orch ls""
And yet, ceph orch ls shows prometheus as a service.

On 3/26/25 11:13, Eugen Block wrote:
ceph orch ls prometheus --export
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Prometheus anomaly in Reef

Reply via email to