Right, systemctl edit works as well. But I'm confused about the down
OSDs. Did you set the proxy on all hosts? Because the down OSDs are on
ceph06 while prometheus is supposed to run on dell02. Are you sure
those are related?
I would recommend to remove the prometheus service entirely and start
from scratch:
ceph orch rm prometheus
ceph mgr module disable prometheus
ceph mgr fail
Wait a minute, then enable it again and deploy prometheus:
ceph orch apply -i prometheus.yaml
ceph mgr module enable prometheus
Zitat von Tim Holloway <t...@mousetech.com>:
Since the containers are all podman, I found a "systemctl edit
podman" command that's recommended to set proxy for that.
However, once I did, 2 OSDs went down and cannot be restarted.
In any event, before I did that, ceph health detail was returning
"HEALTH OK".
Now I'm getting this:
HEALTH_ERR 2 failed cephadm daemon(s); Module 'prometheus' has
failed: gaierror(-2, 'Name or service not known'); too many PGs per
OSD (865 > max 560)
[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
daemon osd.3 on ceph06.internal.mousetech.com is in error state
daemon osd.2 on ceph08.internal.mousetech.com is in error state
[ERR] MGR_MODULE_ERROR: Module 'prometheus' has failed: gaierror(-2,
'Name or service not known')
Module 'prometheus' has failed: gaierror(-2, 'Name or service not known')
[WRN] TOO_MANY_PGS: too many PGs per OSD (865 > max 560)
On 3/26/25 12:07, Eugen Block wrote:
If you need a proxy to pull the images, I suggest to set it in the
containers.conf:
cat /etc/containers/containers.conf
[engine]
env = ["http_proxy=<host>:<port>", "https_proxy=<host>:<port>",
"no_proxy=<your_no_proxy_list>"]
But again, you should be able to see a failed to pull in the
cephadm.log on dell02. Or even in 'ceph health detail', usually it
warns you if the orchestrator failed to place a daemon.
Zitat von Tim Holloway <t...@mousetech.com>:
One thing I did run into when upgrading was TLS issues pulling
images. I had to set HTTP/S_PROXY and pull manually.
That may relate to this:
025-03-26T10:52:16.547985+0000 mgr.dell02.zwnrme (mgr.18015288)
23874 : cephadm [INF] Saving service prometheus spec with
placement dell02.mousetech.com
2025-03-26T10:52:16.560810+0000 mgr.dell02.zwnrme (mgr.18015288)
23875 : cephadm [INF] Saving service node-exporter spec with
placement *
2025-03-26T10:52:16.572380+0000 mgr.dell02.zwnrme (mgr.18015288)
23876 : cephadm [INF] Saving service alertmanager spec with
placement dell02.mousetech.com
2025-03-26T10:52:16.583555+0000 mgr.dell02.zwnrme (mgr.18015288)
23878 : cephadm [INF] Saving service grafana spec with placement
dell02.mousetech.com
2025-03-26T10:52:16.601713+0000 mgr.dell02.zwnrme (mgr.18015288)
23879 : cephadm [INF] Saving service ceph-exporter spec with
placement *
2025-03-26T10:52:44.139886+0000 mgr.dell02.zwnrme (mgr.18015288)
23898 : cephadm [INF] Restart service mgr
2025-03-26T10:53:02.720157+0000 mgr.ceph08.tlocfi (mgr.18043792) 7
: cephadm [INF] [26/Mar/2025:10:53:02] ENGINE Bus STARTING
2025-03-26T10:53:02.824138+0000 mgr.ceph08.tlocfi (mgr.18043792) 8
: cephadm [INF] [26/Mar/2025:10:53:02] ENGINE Serving on
http://10.0.1.58:8765
2025-03-26T10:53:02.962314+0000 mgr.ceph08.tlocfi (mgr.18043792) 9
: cephadm [INF] [26/Mar/2025:10:53:02] ENGINE Serving on
https://10.0.1.58:7150
2025-03-26T10:53:02.962805+0000 mgr.ceph08.tlocfi (mgr.18043792)
10 : cephadm [INF] [26/Mar/2025:10:53:02] ENGINE Bus STARTED
2025-03-26T10:53:02.964966+0000 mgr.ceph08.tlocfi (mgr.18043792)
11 : cephadm [ERR] [26/Mar/2025:10:53:02] ENGINE Error in
HTTPServer.serve
Traceback (most recent call last):
File "/lib/python3.9/site-packages/cheroot/server.py", line
1823, in serve
self._connections.run(self.expiration_interval)
File "/lib/python3.9/site-packages/cheroot/connections.py", line
203, in run
self._run(expiration_interval)
File "/lib/python3.9/site-packages/cheroot/connections.py", line
246, in _run
new_conn = self._from_server_socket(self.server.socket)
File "/lib/python3.9/site-packages/cheroot/connections.py", line
300, in _from_server_socket
s, ssl_env = self.server.ssl_adapter.wrap(s)
File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line
277, in wrap
s = self.context.wrap_socket(
File "/lib64/python3.9/ssl.py", line 501, in wrap_socket
return self.sslsocket_class._create(
File "/lib64/python3.9/ssl.py", line 1074, in _create
self.do_handshake()
File "/lib64/python3.9/ssl.py", line 1343, in do_handshake
self._sslobj.do_handshake()
ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF)
(_ssl.c:1133)
2025-03-26T10:53:03.471114+0000 mgr.ceph08.tlocfi (mgr.18043792)
12 : cephadm [INF] Updating
ceph03.internal.mousetech.com:/etc/ceph/ceph.conf
On 3/26/25 11:39, Eugen Block wrote:
Then maybe the deployment did fail and we’re back at looking into
the cephadm.log.
Zitat von Tim Holloway <t...@mousetech.com>:
it returns nothing. I'd already done the same via "systemctl |
grep prometheus". There simply isn't a systemd service, even
though there should be.
On 3/26/25 11:31, Eugen Block wrote:
There’s a service called „prometheus“, which can have multiple
daemons, just like any other service (mon, mgr etc). To get the
daemon logs you need to provide the daemon name
(prometheus.ceph02.andsopn), not just the service name
(prometheus).
Can you run the cephadm command I provided? It should show
something like I pasted in the previous message.
Zitat von Tim Holloway <t...@mousetech.com>:
service_type: prometheus
service_name: prometheus
placement:
hosts:
- dell02.mousetech.com
networks:
- 10.0.1.0/24
Can't list daemon logs, run restart usw., because "Error
EINVAL: No daemons exist under service name "prometheus". View
currently running services using "ceph orch ls""
And yet, ceph orch ls shows prometheus as a service.
On 3/26/25 11:13, Eugen Block wrote:
ceph orch ls prometheus --export
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io