Am 29.10.2010 17:28, schrieb Anthony Liguori: > On 10/29/2010 09:57 AM, Kevin Wolf wrote: >> Am 29.10.2010 16:40, schrieb Anthony Liguori: >> >>> On 10/29/2010 09:29 AM, Kevin Wolf wrote: >>> >>>> Am 29.10.2010 16:15, schrieb Anthony Liguori: >>>> >>>>> I don't think it's a bad idea to do that but to the extent that the >>>>> block API is designed after posix file I/O, close does not usually imply >>>>> flush. >>>>> >>>>> >>>> I don't think it really resembles POSIX. More or less the only thing >>>> they have in common is that both provide open, read, write and close, >>>> which is something that probably any API for file accesses provides. >>>> >>>> The operation you're talking about here is bdrv_flush/fsync that is not >>>> implied by a POSIX close? >>>> >>>> >>> Yes. But I think for the purposes of this patch, a bdrv_cancel_all() >>> would be just as good. The intention is to eliminate pending I/O >>> requests, the fsync is just a side effect. >>> >> Well, if I'm not mistaken, bdrv_flush would provide only this side >> effect and not the semantics that you're really looking for. This is why >> I suggested adding both bdrv_flush and qemu_aio_flush. We could probably >> introduce a qemu_aio_flush variant that flushes only one >> BlockDriverState - this is what you really want. >> >> >>>>>> And why do we have to flush here, but not before other uses of >>>>>> bdrv_close(), such as eject_device()? >>>>>> >>>>>> >>>>>> >>>>> Good question. Kevin should also confirm, but looking at the code, I >>>>> think flush() is needed before close. If there's a pending I/O event >>>>> and you close before the I/O event is completed, you'll get a callback >>>>> for completion against a bogus BlockDriverState. >>>>> >>>>> I can't find anything in either raw-posix or the generic block layer >>>>> that would mitigate this. >>>>> >>>>> >>>> I'm not aware of anything either. This is what qemu_aio_flush would do. >>>> >>>> It seems reasonable to me to call both qemu_aio_flush and bdrv_flush in >>>> bdrv_close. We probably don't really need to call bdrv_flush to operate >>>> correctly, but it can't hurt and bdrv_close shouldn't happen that often >>>> anyway. >>>> >>>> >>> I agree. Re: qemu_aio_flush, we have to wait for it to complete which >>> gets a little complicated in bdrv_close(). >>> >> qemu_aio_flush is the function that waits for requests to complete. >> > > Please excuse me while my head explodes ;-) > > I think we've got a bit of a problem. > > We have: > > 1) bdrv_flush() - sends an fdatasync > > 2) bdrv_aio_flush() - sends an fdatasync using the thread pool > > 3) qemu_aio_flush() - waits for all pending aio requests to complete > > But we use bdrv_aio_flush() to implement a barrier and we don't actually > preserve those barrier semantics in the thread pool.
Not really. We use it to implement flush commands, which I think don't necessarily constitute a barrier by themselves. > That is: > > If I do: > > bdrv_aio_write() -> A > bdrv_aio_write() -> B > bdrv_aio_flush() -> C > > This will get queued as three requests on the thread pool. (A) is a > write, (B) is a write, and (C) is a fdatasync. > > But if this gets picked up by three separate threads, the ordering isn't > guaranteed. It might be C, B, A. So semantically, is bdrv_aio_flush() > supposed to flush any *pending* writes or any *completed* writes? If > it's the later, we're okay, but if it's the former, we're broken. Right, so don't do that. ;-) bdrv_aio_flush, as I understand it, is meant to flush only completed writes. We've had this discussion before and if I understood right, this is also how real hardware works generally. So to get barrier semantics you as an OS need to flush your queue, i.e. you wait for A and B to complete before you issue C. Christoph should be able to detail on this. Kevin