Re: [ceph-users] long blocking with writes on rbds

Jeff Epstein Wed, 22 Apr 2015 21:27:19 -0700

Do you have some idea how I can diagnose this problem?
I'll look at ceph -s output while you get these stuck process to seeif there's any unusual activity (scrub/deepscrub/recovery/bacfills/...). Is it correlated in any way with rbdremoval (ie: write blocking don't appear unless you removed at leastone rbd for say one hour before the write performance problems).
I'm not familiar with Amazon VMs. If you map the rbds using the kerneldriver to local block devices do you have control over the kernel yourun (I've seen reports of various problems with older kernels and youprobably want the latest possible) ?

ceph status shows nothing unusual. However, on the problematic node, wetypically see entries in ps like this:


 1468 12329 root     D     0.0 mkfs.ext4       wait_on_page_bit
 1468 12332 root     D     0.0 mkfs.ext4       wait_on_buffer

Notice the "D" blocking state. Here, mkfs is stopped on some waitfunctions for long periods of time. (Also, we are formatting the RBDs asext4 even though the OSDs are xfs; I assume this shouldn't be a problem?)

We're on kernel 3.18.4pl2, which is pretty recent. Still, an outdatedkernel driver isn't out of the question; if anyone has any concreteinformation, I'd be grateful.


Jeff
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] long blocking with writes on rbds

Reply via email to