On 05/06 14:03, mabi wrote:
> Hello,
> 
> I have a small 6 nodes Octopus 15.2.11 cluster installed on bare metal with 
> cephadm and I added a second OSD to one of my 3 OSD nodes. I started then 
> copying data to my ceph fs mounted with kernel mount but then both OSDs on 
> that specific nodes crashed.
> 
> To this topic I have the following questions:
> 
> 1) How can I find out why the two OSD crashed? because everything is in 
> podman containers I don't know where are the logs to find out the reason why 
> this happened. From the OS itself everything looks ok, there was no out of 
> memory error.

There should be some logs under /var/log/ceph/<cluster_fsid>/osd.<osd_id>/ on 
the host/hosts that were running the osds.
I found myself sometimes though disabling the '--rm' flag for the pod in the 
'unit.run' script under
/va/lib/ceph/<ceph_fsid>/osd.<id>/unit.run to make podman persist the container 
and be able to do a 'podman logs' on it.
Though that's probably sensible only when debugging.

> 
> 2) I would assume the two OSD container would restart on their own but this 
> is not the case it looks like. How can I restart manually these 2 OSD 
> containers on that node? I believe this should be a "cephadm orch" command?

I think 'ceph orch daemon redeploy' might do it? What is the output of 'ceph 
orch ls' and 'ceph orch ps'?
> 
> The health of the cluster right now is:
> 
>     CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
>     PG_DEGRADED: Degraded data redundancy: 132518/397554 objects degraded 
> (33.333%), 65 pgs degraded, 65 pgs undersized
> 
> Thank your for your hints.
> 
> Best regards,
> Mabi
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."

Attachment: signature.asc
Description: PGP signature

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to