On 04/13/2017 11:02 AM, Jeff Cody wrote: > On Thu, Apr 13, 2017 at 03:39:59PM +0100, Stefan Hajnoczi wrote: >> On Thu, Apr 13, 2017 at 01:45:55PM +0800, Paolo Bonzini wrote: >>> >>> >>> On 13/04/2017 09:11, Jeff Cody wrote: >>>>> It didn't make it into 2.9-rc4 because of limited time. :( >>>>> >>>>> Looks like there is no -rc5, we'll have to document this as a known issue. >>>>> Users should "block-job-complete/cancel" as soon as possible to avoid >>>>> such a >>>>> hang. >>>> >>>> I'd argue for including a fix for 2.9, since this is both a regression, and >>>> a hard lock without possible recovery short of restarting the QEMU process. >>> >>> It is a bit of a corner case (and jobs on I/O thread are relatively rare >>> too), so maybe it's not worth delaying 2.9. It has been delayed already >>> quite a bit. Another reason I think I prefer to wait is to ensure that >>> we have an entry in qemu-iotests to avoid the future regression. >> >> I also think this does not require delaying the release: >> >> 1. It needs to be marked as a known issue in the release notes. >> 2. Let's roll the 2.9.1 stable release within a month of 2.9.0. >> >> If both conditions are met then very few end users will be exposed to >> the problem. I hope libvirt will create IOThreads by default soon but >> for the time being it is not a widely used configuration. >> > > Without the fix, iothreads are not usable in 2.9.0, because a running block > job can create a deadlock by a guest-initiated reboot. I think losing the > ability to use iothreads is enough reason to warrant a fix (especially if an > -rc5 may happen anyway). > > -Jeff >
Not that it's my area of expertise, but given that Fam's "hacky" patch fixes two issues now and this is a deadlock that may indeed occur through normal usage, I'd recommend it go into an rc5 if we're spinning one anyway. +1 to Jeff's reasoning. --js