Am 06.09.2012 12:18, schrieb Paolo Bonzini: > Il 06/09/2012 12:07, Kevin Wolf ha scritto: >>> The AIOCB is already invalid at the time the callback is entered, so we >>> could release it before the call. However, not all implementation of >>> AIO are ready for that and I'm not really in the mood for large scale >>> refactoring... >> >> But the way, what I'd really want to see in the end is to get rid of >> qemu_aio_flush() and replace it by .bdrv_drain() callbacks in each >> BlockDriver. The way we're doing it today is a layering violation. > > That's quite difficult. Completion of an I/O operation can trigger > another I/O operation on another block device, and so on until we go > back to the first device (think of a hypothetical RAID-5 device).
You always have a tree of BDSes, and children should only ever trigger completion of I/O operations in their parents. Am I missing anything? >> Doesn't change anything about this problem, though. So the options that >> we have are: >> >> 1. Delay the callback using a BH. Doing this in each driver is ugly. >> But is there actually more than one possible callback in today's >> coroutine world? I only see bdrv_co_io_em_complete(), which could >> reenter the coroutine from a BH. > > Easy and safe, but it feels a bit like a timebomb. Also, I'm not > entirely sure of _why_ the bottom half works. :) Hm, safe and time bomb is contradictory in my book. :-) The bottom half work because we're not reentering the qcow2_create coroutine immediately, so the gluster AIO callback can complete all of its cleanup work without being interrupted by code that might wait on this particular request and create a deadlock this way. >> 2. Delay the callback by just calling it later when the cleanup has >> been completed and .io_flush() can return 0. You say that it's hard >> to implement for some drivers, except if the AIOCB are leaked until >> the end of functions like qcow2_create(). > > ... which is what we do in posix-aio-compat.c; nobody screamed so far. True. Would be easy to fix in posix-aio-compat, though, or can a callback expect that the AIOCB is still valid? > Not really hard, it just has to be assessed for each driver separately. > We can just do it in gluster and refactor it later. Okay, so let's keep it as an option for now. >> 3. Add a delay only later in functions like bdrv_drain_all() that assume >> that the request has completed. Are there more of this type? AIOCBs >> are leaked until a bdrv_drain_all() call. Does it work with draining >> specific BDSes instead of everything? >> >> Unless I forgot some important point, it almost looks like option 1 is >> the easiest and safest. > > I agree with your opinion, but I would feel better if I understood > better why it works. (2) can be done easily in each driver (no > ugliness) and refactored later. I think option 2 must be done in each driver by design, or do you see even a theoretical way how to do it generically? Kevin