Hello Frédéric, thanks for reply. My reply in text...
Dne st, čen 04, 2025 at 12:38:39 CEST napsal(a) Frédéric Nass: > Hi Jan, > > When this happens, does it happen to a single OSD only? Yes, only one OSD. > Like other OSDs on the same node are still working fine? Yes, other OSDs in this machine works and perform well. > Any useful information system-wise (dmesg) when it happens? I've tried to check it, but there was nothink interesting... > Could it be that the ms_async_rdma_local_gid for this OSD is no longer > available / has changed? [1] It cannot be changed - it's statically set for every machine in the cluster. It is derived from IP address of network card and must be the same for every OSD in certain machine... > Does your setup matche the config options (systemd unit files, > /etc/security/limits.conf, ceph config) mentioned in this article [2] and the > use of RoCE with Jumbo Frames disabled [3][4]? We have enabled Jumbo frames - we need it, and we didn't enable unlimited memlock... Thanks for pointing me this docs... > > As you might know already, you'll get limited support from the community when > it comes to RDMA issues due to the limited number of users of Ceph with RDMA > for low-latency networking. I'm not even sure RDMA in Ceph got beyond the > experimental phase. Maybe someone can shed some light on this. > We mentioned, that all-flash CEPH cluster with images of "disks" of virtual machines for ProxMox with databses and so on, will profit from low latency of RDMA traffic... Our normal traffic to this cluster is around 100MB/s read and 40MB/s write with practically zero latency... > If your workloads really need RDMA, I would advise you to push any ms async > log levels to debug (and any other RDMA related log levels if any) and create > a Ceph tracker. We will try this, thanks. > > Sorry I can't help much when it comes to RDMA as I never used it myself. > > Regards, > Frédéric. > > [1] > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/EHGZ2XFTNWBU7Z32NDOGKVB2I2CV57KH/ > [2] > https://www.stackhpc.com/ceph-on-the-brain-a-year-with-the-human-brain-project.html > [3] > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/5JD4ATRXKMMLIUQI5TUAUYQFGJ45Q7MJ/ > [4] > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/ETNIF52ULSF73F5EEXU3HQ5HK2CLOSEP/ > > ----- Le 3 Juin 25, à 11:10, Jan Marek jma...@jcu.cz a écrit : > > > Hello, > > > > we are using CEPH version 19.2.0... > > > > Sincerely > > Jan Marek > > -- > > Ing. Jan Marek > > University of South Bohemia > > Academic Computer Centre > > Phone: +420389032080 > > http://www.gnu.org/philosophy/no-word-attachments.cs.html > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Ing. Jan Marek University of South Bohemia Academic Computer Centre Phone: +420389032080 http://www.gnu.org/philosophy/no-word-attachments.cs.html
signature.asc
Description: PGP signature
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io