On Thu, Jun 29, 2017 at 10:30 AM Nick Fisk <n...@fisk.me.uk> wrote: > Hi All, > > Putting out a call for help to see if anyone can shed some light on this. > > Configuration: > Ceph cluster presenting RBD's->XFS->NFS->ESXi > Running 10.2.7 on the OSD's and 4.11 kernel on the NFS gateways in a > pacemaker cluster > Both OSD's and clients are go into a pair of switches, single L2 domain (no > sign from pacemaker that there is network connectivity issues) > > Symptoms: > - All RBD's on a single client randomly hang for 30s to several minutes, > confirmed by pacemaker and ESXi hosts complaining > - Cluster load is minimal when this happens most times > - All other clients with RBD's are not affected (Same RADOS pool), so its > seems more of a client issue than cluster issue > - It looks like pacemaker tries to also stop RBD+FS resource, but this also > hangs > - Eventually pacemaker succeeds in stopping resources and immediately > restarts them, IO returns to normal > - No errors, slow requests, or any other non normal Ceph status is reported > on the cluster or ceph.log > - Client logs show nothing apart from pacemaker > > Things I've tried: > - Different kernels (potentially happened less with older kernels, but > can't > be 100% sure) > - Disabling scrubbing and anything else that could be causing high load > - Enabling Kernel RBD debugging (Problem maybe happens a couple of times a > day, debug logging was not practical as I can't reproduce on demand) > > Anyone have any ideas?
Nick, are you using any network aggregation, LACP? Can you drop to a simplest possible configuration to make sure there's nothing on the network switch side? Do you check the ceph.log for any anomalies? Any occurrences on OSD nodes, anything in their OSD logs or syslogs? Aany odd page cache settings on the clients? Alex > > Thanks, > Nick > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- -- Alex Gorbachev Storcium
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com