On Wed, Aug 25, 2021 at 7:02 AM Paul Giralt (pgiralt) <pgir...@cisco.com> wrote:
>
> I upgraded to Pacific 16.2.5 about a month ago and everything was working 
> fine. Suddenly for the past few days I’ve started having the tcmu-runner 
> container on my iSCSI gateways just disappear. I’m assuming this is because 
> they have crashed. I deployed the services using cephadm / ceph orch in 
> Docker containers.
>
> It appears that when the service crashes, the container just disappears and 
> it doesn’t look like tcmu-runner is exporting logs anywhere, so I can’t 
> figure out any way to determine the root cause of these failures. When this 
> happens, it appears to cause issues where I can’t reboot the machine (Running 
> CentOS 8) and I need to power-cycle the server to recover.
>
> I’m really not sure where to look to figure out why it’s suddenly failing. 
> The failure is happening randomly on all 4 of the iSCSI gateways. Any 
> pointers would be greatly appreciated.

Hi Paul,

Does the node hang while shutting down or does it lock up so that you
can't even issue the reboot command?

The first place to look at is dmesg and "systemctl status".  cephadm
wraps the services into systemd units so there should be a record of
it terminating there.  If tcmu-runner is indeed crashing, Xiubo (CCed)
might be able to help with debugging.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to