Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

2017-06-23 Thread Hall, Eric
http://tracker.ceph.com/issues/20393 created with supporting logs/info noted. -- Eric On 6/23/17, 7:54 AM, "Jason Dillaman" wrote: On Fri, Jun 23, 2017 at 8:47 AM, Hall, Eric wrote: > I have debug logs. Should I open a RBD tracker ticket at http://tracker.ceph.com/projects/rbd/issu

Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

2017-06-23 Thread Jason Dillaman
On Fri, Jun 23, 2017 at 8:47 AM, Hall, Eric wrote: > I have debug logs. Should I open a RBD tracker ticket at > http://tracker.ceph.com/projects/rbd/issues for this? Yes, please. You might need to use the "ceph-post-file" utility if the logs are too large to attach to the ticket. In that case,

Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

2017-06-23 Thread Hall, Eric
Only features enabled are layering and deep-flatten: root@cephproxy01:~# rbd -p vms info c9c5db8e-7502-4acc-b670-af18bdf89886_disk rbd image 'c9c5db8e-7502-4acc-b670-af18bdf89886_disk': size 20480 MB in 5120 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.f4e

Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

2017-06-23 Thread Hall, Eric
The problem seems to be reliably reproducible after a fresh reboot of the VM… With this knowledge, I can cause the hung IO condition while having noscrub and nodeepscrub set. Does this confirm this is not-related to http://tracker.ceph.com/issues/20041 ? -- Eric On 6/22/17, 11:23 AM, "Hall, E

Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

2017-06-23 Thread Jason Dillaman
Yes, I"d say they aren't related. Since you can repeat this issue after a fresh VM boot, can you enable debug-level logging for said VM (add "debug rbd = 20" to your ceph.conf) and recreate the issue. Just to confirm, this VM doesn't have any features enabled besides (perhaps) layering? On Fri, Ju

Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

2017-06-22 Thread Hall, Eric
After some testing (doing heavy IO on a rdb-based VM with hung_task_timeout_secs=1 while manually requesting deep-scrubs on the underlying pgs (as determined via rados ls->osdmaptool), I don’t think scrubbing is the cause. At least, I can’t make it happen this way… although I can’t *always* mak

Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

2017-06-21 Thread Jason Dillaman
Do your VMs or OSDs show blocked requests? If you disable scrub or restart the blocked OSD, does the issue go away? If yes, it most likely is this issue [1]. [1] http://tracker.ceph.com/issues/20041 On Wed, Jun 21, 2017 at 3:33 PM, Hall, Eric wrote: > The VMs are using stock Ubuntu14/16 images s

Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

2017-06-21 Thread Hall, Eric
The VMs are using stock Ubuntu14/16 images so yes, there is the default “/sbin/fstrim –all” in /etc/cron.weekly/fstrim. -- Eric On 6/21/17, 1:58 PM, "Jason Dillaman" wrote: Are some or many of your VMs issuing periodic fstrims to discard unused extents? On Wed, Jun 21, 2017 a

Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

2017-06-21 Thread Jason Dillaman
Are some or many of your VMs issuing periodic fstrims to discard unused extents? On Wed, Jun 21, 2017 at 2:36 PM, Hall, Eric wrote: > After following/changing all suggested items (turning off exclusive-lock > (and associated object-map and fast-diff), changing host cache behavior, > etc.) this is

Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

2017-06-21 Thread Hall, Eric
After following/changing all suggested items (turning off exclusive-lock (and associated object-map and fast-diff), changing host cache behavior, etc.) this is still a blocking issue for many uses of our OpenStack/Ceph installation. We have upgraded Ceph to 10.2.7, are running 4.4.0-62 or later