Hi Jan, When this happens, does it happen to a single OSD only? Like other OSDs on the same node are still working fine? Any useful information system-wise (dmesg) when it happens? Could it be that the ms_async_rdma_local_gid for this OSD is no longer available / has changed? [1] Does your setup matche the config options (systemd unit files, /etc/security/limits.conf, ceph config) mentioned in this article [2] and the use of RoCE with Jumbo Frames disabled [3][4]?
As you might know already, you'll get limited support from the community when it comes to RDMA issues due to the limited number of users of Ceph with RDMA for low-latency networking. I'm not even sure RDMA in Ceph got beyond the experimental phase. Maybe someone can shed some light on this. If your workloads really need RDMA, I would advise you to push any ms async log levels to debug (and any other RDMA related log levels if any) and create a Ceph tracker. Sorry I can't help much when it comes to RDMA as I never used it myself. Regards, Frédéric. [1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/EHGZ2XFTNWBU7Z32NDOGKVB2I2CV57KH/ [2] https://www.stackhpc.com/ceph-on-the-brain-a-year-with-the-human-brain-project.html [3] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/5JD4ATRXKMMLIUQI5TUAUYQFGJ45Q7MJ/ [4] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/ETNIF52ULSF73F5EEXU3HQ5HK2CLOSEP/ ----- Le 3 Juin 25, à 11:10, Jan Marek jma...@jcu.cz a écrit : > Hello, > > we are using CEPH version 19.2.0... > > Sincerely > Jan Marek > -- > Ing. Jan Marek > University of South Bohemia > Academic Computer Centre > Phone: +420389032080 > http://www.gnu.org/philosophy/no-word-attachments.cs.html > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io