A few clarifications on our experience:

* We have 200+ rbd images mounted on our RBD-NFS gateway.  (There's
nothing easier for a user to understand than "your disk is full".)

* I'd expect more contention potential with a single shared RBD back
end, but with many distinct and presumably isolated backend RBD images,
I've always been surprised that *all* the nfsd task hang.  This leads me
to think  it's an nfsd issue rather than and rbd issue.  (I realize this
is an rbd list, looking for shared experience. ;) )
 
* I haven't seen any difference between reads and writes.  Any access to
any backing RBD store from the NFS client hangs.

~jpr

On 10/22/2015 06:42 PM, Ryan Tokarek wrote:
>> On Oct 22, 2015, at 3:57 PM, John-Paul Robinson <j...@uab.edu> wrote:
>>
>> Hi,
>>
>> Has anyone else experienced a problem with RBD-to-NFS gateways blocking
>> nfsd server requests when their ceph cluster has a placement group that
>> is not servicing I/O for some reason, eg. too few replicas or an osd
>> with slow request warnings?
> We have experienced exactly that kind of problem except that it sometimes 
> happens even when ceph health reports "HEALTH_OK". This has been incredibly 
> vexing for us. 
>
>
> If the cluster is unhealthy for some reason, then I'd expect your/our 
> symptoms as writes can't be completed. 
>
> I'm guessing that you have file systems with barriers turned on. Whichever 
> file system that has a barrier write stuck on the problem pg, will cause any 
> other process trying to write anywhere in that FS also to block. This likely 
> means a cascade of nfsd processes will block as they each try to service 
> various client writes to that FS. Even though, theoretically, the rest of the 
> "disk" (rbd) and other file systems might still be writable, the NFS 
> processes will still be in uninterruptible sleep just because of that stuck 
> write request (or such is my understanding). 
>
> Disabling barriers on the gateway machine might postpone the problem (never 
> tried it and don't want to) until you hit your vm.dirty_bytes or 
> vm.dirty_ratio thresholds, but it is dangerous as you could much more easily 
> lose data. You'd be better off solving the underlying issues when they happen 
> (too few replicas available or overloaded osds). 
>
>
> For us, even when the cluster reports itself as healthy, we sometimes have 
> this problem. All nfsd processes block. sync blocks. echo 3 > 
> /proc/sys/vm/drop_caches blocks. There is a persistent 4-8MB "Dirty" in 
> /proc/meminfo. None of the osds log slow requests. Everything seems fine on 
> the osds and mons. Neither CPU nor I/O load is extraordinary on the ceph 
> nodes, but at least one file system on the gateway machine will stop 
> accepting writes. 
>
> If we just wait, the situation resolves itself in 10 to 30 minutes. A forced 
> reboot of the NFS gateway "solves" the performance problem, but is annoying 
> and dangerous (we unmount all of the file systems that are still unmountable, 
> but the stuck ones lead us to a sysrq-b). 
>
> This is on Scientific Linux 6.7 systems with elrepo 4.1.10 Kernels running 
> Ceph Firefly (0.8.10) and XFS file systems exported over NFS and samba. 
>
> Ryan
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to