Well, here's an excerpt from the /var/log/ceph/cephadm.log. I don't
know if that's the mechanism or file you mean, though.
2025-03-26 13:11:09,382 7fb2abc38740 DEBUG
--------------------------------------------------------------------------------
cephadm ['--no-container-init', '--timeout', '895', 'gather-facts']
2025-03-26 13:12:10,219 7fc4fd405740 DEBUG
--------------------------------------------------------------------------------
cephadm ['--no-container-init', '--timeout', '895', 'gather-facts']
2025-03-26 13:13:11,502 7f2ef3c76740 DEBUG
--------------------------------------------------------------------------------
cephadm ['--no-container-init', '--timeout', '895', 'gather-facts']
2025-03-26 13:14:12,372 7f3566bef740 DEBUG
--------------------------------------------------------------------------------
cephadm ['--no-container-init', '--timeout', '895', 'gather-facts']
2025-03-26 13:15:13,301 7f660e204740 DEBUG
--------------------------------------------------------------------------------
cephadm ['--no-container-init', '--timeout', '895', 'gather-facts']
2025-03-26 13:15:20,880 7f93b227e740 DEBUG
--------------------------------------------------------------------------------
cephadm ['ls']
2025-03-26 13:15:20,904 7f93b227e740 DEBUG /usr/bin/podman: 5.2.2
2025-03-26 13:15:20,939 7f93b227e740 DEBUG /usr/bin/podman:
2149e16fa2ce,11.51MB / 33.24GB
2025-03-26 13:15:20,939 7f93b227e740 DEBUG /usr/bin/podman:
65529d6ad1ac,17.69MB / 33.24GB
2025-03-26 13:15:20,939 7f93b227e740 DEBUG /usr/bin/podman:
51b1d190dfb9,99.79MB / 33.24GB
2025-03-26 13:15:20,939 7f93b227e740 DEBUG /usr/bin/podman:
59a865e3bcc5,6.791MB / 33.24GB
2025-03-26 13:15:20,939 7f93b227e740 DEBUG /usr/bin/podman:
dd3203f6f3bb,410.2MB / 33.24GB
2025-03-26 13:15:20,939 7f93b227e740 DEBUG /usr/bin/podman:
34177c4e5761,1.764GB / 33.24GB
2025-03-26 13:15:20,939 7f93b227e740 DEBUG /usr/bin/podman:
bfe17e83b288,534.2MB / 33.24GB
2025-03-26 13:15:20,972 7f93b227e740 DEBUG /usr/bin/podman:
2149e16fa2ce,0.00%
2025-03-26 13:15:20,972 7f93b227e740 DEBUG /usr/bin/podman:
65529d6ad1ac,0.26%
2025-03-26 13:15:20,972 7f93b227e740 DEBUG /usr/bin/podman:
51b1d190dfb9,0.22%
2025-03-26 13:15:20,972 7f93b227e740 DEBUG /usr/bin/podman:
59a865e3bcc5,0.02%
2025-03-26 13:15:20,972 7f93b227e740 DEBUG /usr/bin/podman:
dd3203f6f3bb,0.86%
2025-03-26 13:15:20,972 7f93b227e740 DEBUG /usr/bin/podman:
34177c4e5761,1.67%
2025-03-26 13:15:20,972 7f93b227e740 DEBUG /usr/bin/podman:
bfe17e83b288,0.25%
2025-03-26 13:15:20,985 7f93b227e740 DEBUG systemctl: enabled
2025-03-26 13:15:20,993 7f93b227e740 DEBUG systemctl: active
2025-03-26 13:15:21,024 7f93b227e740 DEBUG /usr/bin/podman:
2149e16fa2ce8769bf3be9e6e25eec61b8e027b0e8699f1cb7d5f113fc4aac66,quay.io/prometheus/node-exporter:v1.5.0,0da6a335fe1356545476b749c68f022c897d
e3a2139e8f0054f6937349ee2b83,2025-03-25 16:52:31.644234532 -0400 EDT,
2025-03-26 13:15:21,057 7f93b227e740 DEBUG /usr/bin/podman:
[quay.io/prometheus/node-exporter@sha256:39c642b2b337e38c18e80266fb14383754178202f40103646337722a594d984c
quay.io/prometheus/node-exporter@sh
a256:fa8e5700b7762fffe0674e944762f44bb787a7e44d97569fe55348260453bf80]
2025-03-26 13:15:21,111 7f93b227e740 DEBUG /usr/bin/podman:
node_exporter, version 1.5.0 (branch: HEAD, revision:
1b48970ffcf5630534fb00bb0687d73c66d1c959)
2025-03-26 13:15:21,111 7f93b227e740 DEBUG /usr/bin/podman: build
user: root@6e7732a7b81b
2025-03-26 13:15:21,111 7f93b227e740 DEBUG /usr/bin/podman: build
date: 20221129-18:59:09
2025-03-26 13:15:21,111 7f93b227e740 DEBUG /usr/bin/podman: go
version: go1.19.3
2025-03-26 13:15:21,111 7f93b227e740 DEBUG /usr/bin/podman:
platform: linux/amd64
2025-03-26 13:15:21,187 7f93b227e740 DEBUG systemctl: enabled
2025-03-26 13:15:21,196 7f93b227e740 DEBUG systemctl: active
2025-03-26 13:15:21,228 7f93b227e740 DEBUG /usr/bin/podman:
59a865e3bcc5e86f6caed8278aec0cfed608bf89ff4953dfb48b762138955925,quay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c31
72b0b23b37906,2bc0b0f4375ddf4270a9a865dfd4e53063acc8e6c3afd7a2546507cafd2ec86a,2025-03-25
16:52:31.731849052 -0400 EDT,
2025-03-26 13:15:21,260 7f93b227e740 DEBUG /usr/bin/podman:
[quay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c3172b0b23b37906
quay.io/ceph/ceph@sha256:ac06cdca6f2512a763f1ace85
53330e454152b82f95a2b6bf33c3f3ec2eeac77]
2025-03-26 13:15:21,385 7f93b227e740 DEBUG /usr/bin/podman: ceph
version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable)
:2025-03-26 13:15:21,412 7f93b227e740 DEBUG systemctl: enabled
2025-03-26 13:15:21,421 7f93b227e740 DEBUG systemctl: active
2025-03-26 13:15:21,451 7f93b227e740 DEBUG /usr/bin/podman:
bfe17e83b28821be0ec399cde79965ade3bc3377c5acf05ef047395ddde4d804,quay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c31
72b0b23b37906,2bc0b0f4375ddf4270a9a865dfd4e53063acc8e6c3afd7a2546507cafd2ec86a,2025-03-26
06:53:07.022104802 -0400 EDT,
2025-03-26 13:15:21,464 7f93b227e740 DEBUG systemctl: enabled
2025-03-26 13:15:21,472 7f93b227e740 DEBUG systemctl: active
2025-03-26 13:15:21,504 7f93b227e740 DEBUG /usr/bin/podman:
51b1d190dfb9a1db73b8efda020c54df4c339abce8973b8e0d6de2a2b780aa09,quay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c31
72b0b23b37906,2bc0b0f4375ddf4270a9a865dfd4e53063acc8e6c3afd7a2546507cafd2ec86a,2025-03-25
16:52:31.726614643 -0400 EDT,
2025-03-26 13:15:21,516 7f93b227e740 DEBUG systemctl: enabled
2025-03-26 13:15:21,524 7f93b227e740 DEBUG systemctl: active
2025-03-26 13:15:21,557 7f93b227e740 DEBUG /usr/bin/podman:
dd3203f6f3bb3876ea35d8732c01211bb9cc79bff2258a7d63f425eb00e0221d,quay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c31
72b0b23b37906,2bc0b0f4375ddf4270a9a865dfd4e53063acc8e6c3afd7a2546507cafd2ec86a,2025-03-25
16:52:31.898369305 -0400 EDT,
2025-03-26 13:15:21,570 7f93b227e740 DEBUG systemctl: enabled
2025-03-26 13:15:21,579 7f93b227e740 DEBUG systemctl: active
2025-03-26 13:15:21,611 7f93b227e740 DEBUG /usr/bin/podman:
34177c4e5761c9b1e232a7f4a854fa1c8fe187253503265998c9cadd2cb7625c,quay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c31
72b0b23b37906,2bc0b0f4375ddf4270a9a865dfd4e53063acc8e6c3afd7a2546507cafd2ec86a,2025-03-25
16:52:33.635739799 -0400 EDT,
2025-03-26 13:15:21,623 7f93b227e740 DEBUG systemctl: enabled
2025-03-26 13:15:21,632 7f93b227e740 DEBUG systemctl: active
2025-03-26 13:15:21,662 7f93b227e740 DEBUG /usr/bin/podman:
65529d6ad1ac3c639ef699c2eed01b6a440e27925d5bccd5fb0eef50b283dab3,quay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c31
72b0b23b37906,2bc0b0f4375ddf4270a9a865dfd4e53063acc8e6c3afd7a2546507cafd2ec86a,2025-03-25
16:52:31.726574789 -0400 EDT,
2025-03-26 13:16:14,190 7fa738df4740 DEBUG
--------------------------------------------------------------------------------
cephadm ['--no-container-init', '--timeout', '895', 'gather-facts']
2025-03-26 13:17:15,057 7f906b406740 DEBUG
--------------------------------------------------------------------------------
cephadm ['--no-container-init', '--timeout', '895', 'gather-facts']
2025-03-26 13:18:15,951 7f3141a37740 DEBUG
--------------------------------------------------------------------------------
cephadm ['--image',
'quay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c3172b0b23b37906',
'--no-container-init', '--timeout', '895', 'ls']
2025-03-26 13:18:17,047 7feb94c28740 DEBUG
--------------------------------------------------------------------------------
cephadm ['--no-container-init', '--timeout', '895', 'gather-facts']
2025-03-26 13:19:18,797 7f23b641f740 DEBUG
--------------------------------------------------------------------------------
cephadm ['--no-container-init', '--timeout', '895', 'gather-facts']
2025-03-26 13:20:19,681 7f270b666740 DEBUG
--------------------------------------------------------------------------------
cephadm ['--no-container-init', '--timeout', '895', 'gather-facts']
2025-03-26 13:21:20,566 7fcd77be8740 DEBUG
--------------------------------------------------------------------------------
cephadm ['--no-container-init', '--timeout', '895', 'check-host']
2025-03-26 13:21:20,595 7fcd77be8740 INFO podman (/usr/bin/podman)
version 5.2.2 is present
2025-03-26 13:21:20,595 7fcd77be8740 INFO systemctl is present
2025-03-26 13:21:20,596 7fcd77be8740 INFO lvcreate is present
2025-03-26 13:21:20,635 7fcd77be8740 INFO Unit chronyd.service is
enabled and running
2025-03-26 13:21:20,635 7fcd77be8740 INFO Host looks OK
2025-03-26 13:21:21,016 7f670722d740 DEBUG
--------------------------------------------------------------------------------
cephadm ['--no-container-init', '--timeout', '895', 'gather-facts']
2025-03-26 13:22:21,860 7f29c27cb740 DEBUG
--------------------------------------------------------------------------------
cephadm ['--no-container-init', '--timeout', '895', 'gather-facts']
2025-03-26 13:23:23,116 7f41cdc5d740 DEBUG
--------------------------------------------------------------------------------
:
plus more of the same.
The mgr log for dell02 isn't very exciting except for frequent
exceptions where the dashboard cannot contact prometheus.
Is there a place I could post complete files without filling up the
mailing list?
On 3/26/25 13:23, Eugen Block wrote:
Ok, I'll try one last time and ask for cephadm.log output. ;-) And
the active MGR's log might help here as well.
Zitat von Tim Holloway <t...@mousetech.com>:
No change.
On 3/26/25 13:01, Tim Holloway wrote:
It's strange, but for a while I'd been trying to get prometheus
working on ceph08, so I don't know.
All I do know is immediately after editing the proxy settings I
got indications that those 2 OSDs had gone down.
What's REALLY strange is that their logs seem to hint that somehow
they shifted from administered to legacy configuration. That is,
looking for OSD resources under /var/lib/ceph instead of
/var/lib/ceph/<fsid>.
Anyway, I'll try yanking and re-deploying prometheus and maybe
that will magically cure something.
On 3/26/25 12:53, Eugen Block wrote:
Right, systemctl edit works as well. But I'm confused about the
down OSDs. Did you set the proxy on all hosts? Because the down
OSDs are on ceph06 while prometheus is supposed to run on dell02.
Are you sure those are related?
I would recommend to remove the prometheus service entirely and
start from scratch:
ceph orch rm prometheus
ceph mgr module disable prometheus
ceph mgr fail
Wait a minute, then enable it again and deploy prometheus:
ceph orch apply -i prometheus.yaml
ceph mgr module enable prometheus
Zitat von Tim Holloway <t...@mousetech.com>:
Since the containers are all podman, I found a "systemctl edit
podman" command that's recommended to set proxy for that.
However, once I did, 2 OSDs went down and cannot be restarted.
In any event, before I did that, ceph health detail was
returning "HEALTH OK".
Now I'm getting this:
HEALTH_ERR 2 failed cephadm daemon(s); Module 'prometheus' has
failed: gaierror(-2, 'Name or service not known'); too many PGs
per OSD (865 > max 560)
[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
daemon osd.3 on ceph06.internal.mousetech.com is in error state
daemon osd.2 on ceph08.internal.mousetech.com is in error state
[ERR] MGR_MODULE_ERROR: Module 'prometheus' has failed:
gaierror(-2, 'Name or service not known')
Module 'prometheus' has failed: gaierror(-2, 'Name or
service not known')
[WRN] TOO_MANY_PGS: too many PGs per OSD (865 > max 560)
On 3/26/25 12:07, Eugen Block wrote:
If you need a proxy to pull the images, I suggest to set it in
the containers.conf:
cat /etc/containers/containers.conf
[engine]
env = ["http_proxy=<host>:<port>", "https_proxy=<host>:<port>",
"no_proxy=<your_no_proxy_list>"]
But again, you should be able to see a failed to pull in the
cephadm.log on dell02. Or even in 'ceph health detail', usually
it warns you if the orchestrator failed to place a daemon.
Zitat von Tim Holloway <t...@mousetech.com>:
One thing I did run into when upgrading was TLS issues pulling
images. I had to set HTTP/S_PROXY and pull manually.
That may relate to this:
025-03-26T10:52:16.547985+0000 mgr.dell02.zwnrme
(mgr.18015288) 23874 : cephadm [INF] Saving service prometheus
spec with placement dell02.mousetech.com
2025-03-26T10:52:16.560810+0000 mgr.dell02.zwnrme
(mgr.18015288) 23875 : cephadm [INF] Saving service
node-exporter spec with placement *
2025-03-26T10:52:16.572380+0000 mgr.dell02.zwnrme
(mgr.18015288) 23876 : cephadm [INF] Saving service
alertmanager spec with placement dell02.mousetech.com
2025-03-26T10:52:16.583555+0000 mgr.dell02.zwnrme
(mgr.18015288) 23878 : cephadm [INF] Saving service grafana
spec with placement dell02.mousetech.com
2025-03-26T10:52:16.601713+0000 mgr.dell02.zwnrme
(mgr.18015288) 23879 : cephadm [INF] Saving service
ceph-exporter spec with placement *
2025-03-26T10:52:44.139886+0000 mgr.dell02.zwnrme
(mgr.18015288) 23898 : cephadm [INF] Restart service mgr
2025-03-26T10:53:02.720157+0000 mgr.ceph08.tlocfi
(mgr.18043792) 7 : cephadm [INF] [26/Mar/2025:10:53:02] ENGINE
Bus STARTING
2025-03-26T10:53:02.824138+0000 mgr.ceph08.tlocfi
(mgr.18043792) 8 : cephadm [INF] [26/Mar/2025:10:53:02] ENGINE
Serving on http://10.0.1.58:8765
2025-03-26T10:53:02.962314+0000 mgr.ceph08.tlocfi
(mgr.18043792) 9 : cephadm [INF] [26/Mar/2025:10:53:02] ENGINE
Serving on https://10.0.1.58:7150
2025-03-26T10:53:02.962805+0000 mgr.ceph08.tlocfi
(mgr.18043792) 10 : cephadm [INF] [26/Mar/2025:10:53:02]
ENGINE Bus STARTED
2025-03-26T10:53:02.964966+0000 mgr.ceph08.tlocfi
(mgr.18043792) 11 : cephadm [ERR] [26/Mar/2025:10:53:02]
ENGINE Error in HTTPServer.serve
Traceback (most recent call last):
File "/lib/python3.9/site-packages/cheroot/server.py", line
1823, in serve
self._connections.run(self.expiration_interval)
File "/lib/python3.9/site-packages/cheroot/connections.py",
line 203, in run
self._run(expiration_interval)
File "/lib/python3.9/site-packages/cheroot/connections.py",
line 246, in _run
new_conn = self._from_server_socket(self.server.socket)
File "/lib/python3.9/site-packages/cheroot/connections.py",
line 300, in _from_server_socket
s, ssl_env = self.server.ssl_adapter.wrap(s)
File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py",
line 277, in wrap
s = self.context.wrap_socket(
File "/lib64/python3.9/ssl.py", line 501, in wrap_socket
return self.sslsocket_class._create(
File "/lib64/python3.9/ssl.py", line 1074, in _create
self.do_handshake()
File "/lib64/python3.9/ssl.py", line 1343, in do_handshake
self._sslobj.do_handshake()
ssl.SSLZeroReturnError: TLS/SSL connection has been closed
(EOF) (_ssl.c:1133)
2025-03-26T10:53:03.471114+0000 mgr.ceph08.tlocfi
(mgr.18043792) 12 : cephadm [INF] Updating
ceph03.internal.mousetech.com:/etc/ceph/ceph.conf
On 3/26/25 11:39, Eugen Block wrote:
Then maybe the deployment did fail and we’re back at looking
into the cephadm.log.
Zitat von Tim Holloway <t...@mousetech.com>:
it returns nothing. I'd already done the same via "systemctl
| grep prometheus". There simply isn't a systemd service,
even though there should be.
On 3/26/25 11:31, Eugen Block wrote:
There’s a service called „prometheus“, which can have
multiple daemons, just like any other service (mon, mgr
etc). To get the daemon logs you need to provide the daemon
name (prometheus.ceph02.andsopn), not just the service name
(prometheus).
Can you run the cephadm command I provided? It should show
something like I pasted in the previous message.
Zitat von Tim Holloway <t...@mousetech.com>:
service_type: prometheus
service_name: prometheus
placement:
hosts:
- dell02.mousetech.com
networks:
- 10.0.1.0/24
Can't list daemon logs, run restart usw., because "Error
EINVAL: No daemons exist under service name "prometheus".
View currently running services using "ceph orch ls""
And yet, ceph orch ls shows prometheus as a service.
On 3/26/25 11:13, Eugen Block wrote:
ceph orch ls prometheus --export
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io