I can't run the Ceph container on the master node (blade3n1) anymore. It's
not executed anymore without an error message. Here is what cephadm ls says:
mixtile@blade3n1:~$ sudo cephadm ls
[
{
"style": "cephadm:v1",
"name": "mon.blade3n1",
"fsid": "8aad3073-39a1-11f1-bf6e-f2704a1efa9b",
"systemd_unit":
"[email protected]"
,
"enabled": true,
"state": "error",
"service_name": "mon",
"memory_request": null,
"memory_limit": null,
"ports": [],
"container_id": null,
"container_image_name": "quay.io/ceph/ceph:v19",
"container_image_id": null,
"container_image_digests": null,
"version": null,
"started": null,
"created": "2026-04-16T14:35:47.634066Z",
"deployed": "2026-04-16T14:35:45.414037Z",
"configured": "2026-04-20T17:16:32.722329Z"
},
{
"style": "cephadm:v1",
"name": "node-exporter.blade3n1",
"fsid": "8aad3073-39a1-11f1-bf6e-f2704a1efa9b",
"systemd_unit":
"ceph-8aad3073-39a1-11f1-bf6e-f2704a1efa9b@node-exporter
.blade3n1",
"enabled": true,
"state": "error",
"service_name": "node-exporter",
"ports": [
9100
],
"ip": null,
"deployed_by": [
"
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe219dfc8d0c3efc1f05102
a753902f33ee16c26b6cee"
],
"rank": null,
"rank_generation": null,
"extra_container_args": null,
"extra_entrypoint_args": null,
"memory_request": null,
"memory_limit": null,
"container_id": null,
"container_image_name": "quay.io/prometheus/node-exporter:v1.7.0",
"container_image_id": null,
"container_image_digests": null,
"version": null,
"started": null,
"created": "2026-04-21T12:55:39.731035Z",
"deployed": "2026-04-21T12:55:38.217675Z",
"configured": "2026-04-21T12:55:39.734369Z"
},
{
"style": "cephadm:v1",
"name": "ceph-exporter.blade3n1",
"fsid": "8aad3073-39a1-11f1-bf6e-f2704a1efa9b",
"systemd_unit":
"ceph-8aad3073-39a1-11f1-bf6e-f2704a1efa9b@ceph-exporter
.blade3n1",
"enabled": true,
"state": "error",
"service_name": "ceph-exporter",
"ports": [],
"ip": null,
"deployed_by": [
"
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe219dfc8d0c3efc1f05102
a753902f33ee16c26b6cee"
],
"rank": null,
"rank_generation": null,
"extra_container_args": null,
"extra_entrypoint_args": null,
"memory_request": null,
"memory_limit": null,
"container_id": null,
"container_image_name": "
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe2
19dfc8d0c3efc1f05102a753902f33ee16c26b6cee",
"container_image_id": null,
"container_image_digests": null,
"version": null,
"started": null,
"created": "2026-04-16T14:37:32.218782Z",
"deployed": "2026-04-16T14:37:30.612094Z",
"configured": "2026-04-20T17:16:36.139048Z"
},
{
"style": "cephadm:v1",
"name": "mgr.blade3n1.rrlwwv",
"fsid": "8aad3073-39a1-11f1-bf6e-f2704a1efa9b",
"systemd_unit":
"[email protected].
rrlwwv",
"enabled": true,
"state": "error",
"service_name": "mgr",
"memory_request": null,
"memory_limit": null,
"ports": [
9283,
8765,
8443
],
"container_id": null,
"container_image_name": "quay.io/ceph/ceph:v19",
"container_image_id": null,
"container_image_digests": null,
"version": null,
"started": null,
"created": "2026-04-16T14:35:54.054151Z",
"deployed": "2026-04-16T14:35:52.430796Z",
"configured": "2026-04-20T17:16:37.612403Z"
},
{
"style": "cephadm:v1",
"name": "crash.blade3n1",
"fsid": "8aad3073-39a1-11f1-bf6e-f2704a1efa9b",
"systemd_unit":
"[email protected]
1",
"enabled": true,
"state": "error",
"service_name": "crash",
"ports": [],
"ip": null,
"deployed_by": [
"
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe219dfc8d0c3efc1f05102
a753902f33ee16c26b6cee"
],
"rank": null,
"rank_generation": null,
"extra_container_args": null,
"extra_entrypoint_args": null,
"memory_request": null,
"memory_limit": null,
"container_id": null,
"container_image_name": "
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe2
19dfc8d0c3efc1f05102a753902f33ee16c26b6cee",
"container_image_id": null,
"container_image_digests": null,
"version": null,
"started": null,
"created": "2026-04-16T14:37:36.855510Z",
"deployed": "2026-04-16T14:37:35.268822Z",
"configured": "2026-04-20T17:16:39.025758Z"
},
{
"style": "cephadm:v1",
"name": "osd.3",
"fsid": "8aad3073-39a1-11f1-bf6e-f2704a1efa9b",
"systemd_unit": "[email protected]",
"enabled": true,
"state": "error",
"service_name": "osd",
"ports": [],
"ip": null,
"deployed_by": [
"
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe219dfc8d0c3efc1f05102
a753902f33ee16c26b6cee"
],
"rank": null,
"rank_generation": null,
"extra_container_args": null,
"extra_entrypoint_args": null,
"memory_request": null,
"memory_limit": null,
"container_id": null,
"container_image_name": "
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe2
19dfc8d0c3efc1f05102a753902f33ee16c26b6cee",
"container_image_id": null,
"container_image_digests": null,
"version": null,
"started": null,
"created": "2026-04-23T15:05:00.686688Z",
"deployed": "2026-04-23T15:04:59.176667Z",
"configured": "2026-04-23T15:05:00.693355Z"
},
{
"style": "cephadm:v1",
"name": "mds.data.blade3n1.eczeqc",
"fsid": "8aad3073-39a1-11f1-bf6e-f2704a1efa9b",
"systemd_unit":
"[email protected]
e3n1.eczeqc",
"enabled": true,
"state": "error",
"service_name": "mds.data",
"ports": [],
"ip": null,
"deployed_by": [
"
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe219dfc8d0c3efc1f05102
a753902f33ee16c26b6cee"
],
"rank": null,
"rank_generation": null,
"extra_container_args": null,
"extra_entrypoint_args": null,
"memory_request": null,
"memory_limit": null,
"container_id": null,
"container_image_name": "
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe2
19dfc8d0c3efc1f05102a753902f33ee16c26b6cee",
"container_image_id": null,
"container_image_digests": null,
"version": null,
"started": null,
"created": "2026-04-16T15:54:13.264224Z",
"deployed": "2026-04-16T15:54:10.870858Z",
"configured": "2026-04-20T17:16:40.499113Z"
}
]
Am Mi., 27. Mai 2026 um 15:07 Uhr schrieb Jacek Rużyczka <
[email protected]>:
Hi Eugen,
You might need to run 'systemctl reset-failed...' to let systemd start the
containers.
I've already done that. No use. Even worse: On node #1, Docker no longer
starts. When trying to restart the daemon, I get errors like this:
docker.service: Failed with result 'core-dump'.
But before you do that, do you have MON logs with an explanation why they
refuse to start?
Unfortunately no, not even in the syslog. In the meantime, I was able to
start another MON via Cephadm (because the Docker instance had even deleted
the image), but now I've got the problem with the one node, where Docker
refuses to start.
Regarding Ceph images, your cluster uses af0c5903e901 for the Ceph
services, what does 'docker images | grep af0c5903e901' show?
On the affected node, nothing 'cause the Docker daemon wouldn't even start.
I have the impression that this is a "regular" cephadm cluster
True
BTW, when running the test script supplied by the Docker guys
https://docs.docker.com/engine/daemon/troubleshoot/, I get some warnings:
- Network Drivers:
- "bridge":
- sysctl net.ipv4.ip_forward: disabled
- sysctl net.ipv6.conf.all.forwarding: disabled
- sysctl net.ipv6.conf.default.forwarding: disabled
On nodes #2 thru #4, net.ipv4.ip_forward is enabled.
Regards
Jacek Rużyczka