Il 12/08/2014 10:12, Ming Lei ha scritto: >> > The below patch is basically the minimal change to bypass coroutines. Of >> > course >> > the block.c part is not acceptable as is (the change to >> > refresh_total_sectors >> > is broken, the others are just ugly), but it is a start. Please run it >> > with >> > your fio workloads, or write an aio-based version of a qemu-img/qemu-io >> > *I/O* >> > benchmark. > Could you explain why the new change is introduced?
It provides a fast path for bdrv_aio_readv/writev whenever there is nothing to do after the driver routine returns. In this case there is no need to wrap the AIOCB returned by the driver routine. It doesn't go all the way, and in particular it doesn't reverse completely the roles of bdrv_co_readv/writev vs. bdrv_aio_readv/writev. But it is enough to provide something that is not dataplane-specific, does not break various functionality that we need to add to dataplane virtio-blk, does not mess up the semantics of the block layer, and lets you run benchmarks. > I will hold it until we can align to the coroutine cost computation, > because it is very important for the discussion. First of all, note that the coroutine cost is totally pointless in the discussion unless you have 100% CPU time and the dataplane thread becomes CPU bound. You haven't said if this is the case. Second, if the coroutine cost is relevant, the profile is really too flat to do much about it. The only solution (and here I *think* I disagree slightly with Kevin) is to get rid of it, which is not even too hard to do. The problem is that your patches to do touch too much code and subtly break too much stuff. The one I wrote does have a little breakage because I don't understand bs->growable 100% and I didn't really put much effort into it (my deadline being basically "be done as soon as the shower is free"), and it is ugly as hell, _but_ it should be compatible with the way the block layer works. Paolo