A few clarifications on our experience: * We have 200+ rbd images mounted on our RBD-NFS gateway. (There's nothing easier for a user to understand than "your disk is full".)
* I'd expect more contention potential with a single shared RBD back end, but with many distinct and presumably isolated backend RBD images, I've always been surprised that *all* the nfsd task hang. This leads me to think it's an nfsd issue rather than and rbd issue. (I realize this is an rbd list, looking for shared experience. ;) ) * I haven't seen any difference between reads and writes. Any access to any backing RBD store from the NFS client hangs. ~jpr On 10/22/2015 06:42 PM, Ryan Tokarek wrote: >> On Oct 22, 2015, at 3:57 PM, John-Paul Robinson <j...@uab.edu> wrote: >> >> Hi, >> >> Has anyone else experienced a problem with RBD-to-NFS gateways blocking >> nfsd server requests when their ceph cluster has a placement group that >> is not servicing I/O for some reason, eg. too few replicas or an osd >> with slow request warnings? > We have experienced exactly that kind of problem except that it sometimes > happens even when ceph health reports "HEALTH_OK". This has been incredibly > vexing for us. > > > If the cluster is unhealthy for some reason, then I'd expect your/our > symptoms as writes can't be completed. > > I'm guessing that you have file systems with barriers turned on. Whichever > file system that has a barrier write stuck on the problem pg, will cause any > other process trying to write anywhere in that FS also to block. This likely > means a cascade of nfsd processes will block as they each try to service > various client writes to that FS. Even though, theoretically, the rest of the > "disk" (rbd) and other file systems might still be writable, the NFS > processes will still be in uninterruptible sleep just because of that stuck > write request (or such is my understanding). > > Disabling barriers on the gateway machine might postpone the problem (never > tried it and don't want to) until you hit your vm.dirty_bytes or > vm.dirty_ratio thresholds, but it is dangerous as you could much more easily > lose data. You'd be better off solving the underlying issues when they happen > (too few replicas available or overloaded osds). > > > For us, even when the cluster reports itself as healthy, we sometimes have > this problem. All nfsd processes block. sync blocks. echo 3 > > /proc/sys/vm/drop_caches blocks. There is a persistent 4-8MB "Dirty" in > /proc/meminfo. None of the osds log slow requests. Everything seems fine on > the osds and mons. Neither CPU nor I/O load is extraordinary on the ceph > nodes, but at least one file system on the gateway machine will stop > accepting writes. > > If we just wait, the situation resolves itself in 10 to 30 minutes. A forced > reboot of the NFS gateway "solves" the performance problem, but is annoying > and dangerous (we unmount all of the file systems that are still unmountable, > but the stuck ones lead us to a sysrq-b). > > This is on Scientific Linux 6.7 systems with elrepo 4.1.10 Kernels running > Ceph Firefly (0.8.10) and XFS file systems exported over NFS and samba. > > Ryan > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com