On Sat, May 31, 2014 at 08:43:07PM +0200, Max Reitz wrote: > For the NBD server to work with dataplane, it needs to correctly access > the exported BDS. It makes the most sense to run both in the same > AioContext, therefore this series implements methods for tracking a > BDS's AioContext and makes NBD make use of this for keeping the clients > connected to that BDS in the same AioContext. > > The reason this is an RFC and not a PATCH is my inexperience with AIO, > coroutines and the like. Also, I'm not sure about what to do about the > coroutines. The NBD server has up to two coroutines per client: One for > receiving and one for sending. Theoretically, both have to be > "transferred" to the new AioContext if it is changed; however, as far as > I see it, coroutines are not really bound to an AioContext, they are > simply run in the AioContext entering them. Therefore, I think a > transfer is unnecessary. All coroutines are entered from nbd_read() and > nbd_restart_write(), both of which are AIO routines registered via > aio_set_fd_handler2(). > > As bs_aio_detach() unregisters all of these routines, the coroutines can > no longer be entered, but only after bs_aio_attach() is called again. > Then, when they are called, they will enter the coroutines in the new > AioContext. Therefore, I think an explicit transfer unnecessary.
This reasoning sounds correct. > However, if bs_aio_detach() is called from a different thread than the > old AioContext is running in, we may still have coroutines running for > which we should wait before returning from bs_aio_detach(). The bdrv_attach/detach_aio_context() APIs have rules regarding where these functions are called from: /** * bdrv_set_aio_context: * * Changes the #AioContext used for fd handlers, timers, and BHs by this * BlockDriverState and all its children. * * This function must be called from the old #AioContext or with a lock held so * the old #AioContext is not executing. */ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context); and: /* Remove fd handlers, timers, and other event loop callbacks so the event * loop is no longer in use. Called with no in-flight requests and in * depth-first traversal order with parents before child nodes. */ void (*bdrv_detach_aio_context)(BlockDriverState *bs); /* Add fd handlers, timers, and other event loop callbacks so I/O requests * can be processed again. Called with no in-flight requests and in * depth-first traversal order with child nodes before parent nodes. */ void (*bdrv_attach_aio_context)(BlockDriverState *bs, AioContext *new_context); These rules ensure that it's safe to perform these operations. You don't have to support arbitrary callers in NBD either. > But because of my inexperience with coroutines, I'm not sure. I now have > these patches nearly unchanged here for about a week and I'm looking for > ways of testing them, but so far I could only test whether the old use > cases work, but not whether they will work for what they are intended to > do: With BDS changing their AioContext. > > So, because I'm not sure what else to do and because I don't know how to > test multiple AIO threads (how do I move a BDS into another iothread?) > I'm just sending this out as an RFC. Use a Linux guest with virtio-blk: qemu -drive if=none,file=test.img,id=drive0 \ -object iothread,id=iothread0 \ -device virtio-blk-pci,drive=drive0,x-iothread=iothread0 \ ... Once the guest has booted the virtio-blk device will be in dataplane mode. That means drive0's BlockDriverState ->aio_context will be the IOThread AioContext and not the global qemu_aio_context. Now you can exercise the run-time NBD server over QMP and check that things still work. For example, try running a few instance of dd if=/dev/vdb of=/dev/null iflag=direct inside the guest to stress guest I/O. Typically what happens if code is not dataplane-aware is that a deadlock or crash occurs due to race conditions between the QEMU main loop and the IOThread for this virtio-blk device. For an overview of dataplane programming concepts, see: https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg01436.html Stefan