On 05/06 14:03, mabi wrote: > Hello, > > I have a small 6 nodes Octopus 15.2.11 cluster installed on bare metal with > cephadm and I added a second OSD to one of my 3 OSD nodes. I started then > copying data to my ceph fs mounted with kernel mount but then both OSDs on > that specific nodes crashed. > > To this topic I have the following questions: > > 1) How can I find out why the two OSD crashed? because everything is in > podman containers I don't know where are the logs to find out the reason why > this happened. From the OS itself everything looks ok, there was no out of > memory error.
There should be some logs under /var/log/ceph/<cluster_fsid>/osd.<osd_id>/ on the host/hosts that were running the osds. I found myself sometimes though disabling the '--rm' flag for the pod in the 'unit.run' script under /va/lib/ceph/<ceph_fsid>/osd.<id>/unit.run to make podman persist the container and be able to do a 'podman logs' on it. Though that's probably sensible only when debugging. > > 2) I would assume the two OSD container would restart on their own but this > is not the case it looks like. How can I restart manually these 2 OSD > containers on that node? I believe this should be a "cephadm orch" command? I think 'ceph orch daemon redeploy' might do it? What is the output of 'ceph orch ls' and 'ceph orch ps'? > > The health of the cluster right now is: > > CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s) > PG_DEGRADED: Degraded data redundancy: 132518/397554 objects degraded > (33.333%), 65 pgs degraded, 65 pgs undersized > > Thank your for your hints. > > Best regards, > Mabi > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- David Caro SRE - Cloud Services Wikimedia Foundation <https://wikimediafoundation.org/> PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3 "Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment."
signature.asc
Description: PGP signature
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io