Re: [Qemu-devel] [RFC PATCH 00/40] Sneak peek of virtio and dataplane changes for 2.6

Paolo Bonzini Wed, 16 Dec 2015 09:44:46 -0800


On 16/12/2015 13:54, Christian Borntraeger wrote:
> Just some quick remarks before I leave into vacation:
> 
> Performance seems to be better than the initial version. I have some
> hangs from time to time when shutting down (also with your previous
> version)


Yes, I've seen them too.  To reproduce, just start some "dd" and
repeatedly stop/cont.  It hangs after 3-4 tries typically.

It's like this:

        I/O thread                      main thread
        -----------------------------------------------------
                                        aio_context_acquire
                                        bdrv_drain
        bh->scheduled = 0
        call bottom half
           aio_context_acquire
           <hang>


The bottom half is hung in the I/O thread, and will never be processed
by the main thread because bh->scheduled = 0.

My current idea is that *all* callers of aio_poll have to call it
without the AioContext acquired, including bdrv_drain.  It's not a
problem now that Fam has fixed almost all bdrv_drain users to use
bdrv_drained_begin/end instead.

The nastiness here is two-fold:

1) doing

        aio_context_release
        aio_poll
        aio_context_acquire

in bdrv_drain is not very safe because the mutex is recursive.  (Have I
ever mentioned I don't like recursive mutexes?...)

2) there is still a race in bdrv_drain

                    bdrv_flush_io_queue(bs);
                    if (bdrv_requests_pending(bs)) {
                        busy = true;
                        aio_context_release(aio_context);
                        aio_poll(aio_context, busy);
                        aio_context_acquire(aio_context);

if we release before aio_poll, the last request can be completed
before aio_poll is invoked, and aio_poll will block forever.  In Linux
terms, bdrv_request_pending would be prepare_to_wait() and aio_poll
would be schedule().  But, we don't have prepare_to_wait().  I have a
hackish solution here, but I don't like it really.

Paolo

Re: [Qemu-devel] [RFC PATCH 00/40] Sneak peek of virtio and dataplane changes for 2.6

Reply via email to