On Fri, Mar 07, 2025 at 11:16:30PM +0100, Kevin Wolf wrote: > Until now, FUA was always emulated with a separate flush after the write > for file-posix. The overhead of processing a second request can reduce > performance significantly for a guest disk that has disabled the write > cache, especially if the host disk is already write through, too, and > the flush isn't actually doing anything. > > Advertise support for REQ_FUA in write requests and implement it for > Linux AIO and io_uring using the RWF_DSYNC flag for write requests. The > thread pool still performs a separate fdatasync() call. This can be > improved later by using the pwritev2() syscall if available. > > As an example, this is how fio numbers can be improved in some scenarios > with this patch (all using virtio-blk with cache=directsync on an nvme > block device for the VM, fio with ioengine=libaio,direct=1,sync=1): > > | old | with FUA support > ------------------------------+---------------+------------------- > bs=4k, iodepth=1, numjobs=1 | 45.6k iops | 56.1k iops > bs=4k, iodepth=1, numjobs=16 | 183.3k iops | 236.0k iops > bs=4k, iodepth=16, numjobs=1 | 258.4k iops | 311.1k iops > > However, not all scenarios are clear wins. On another slower disk I saw > little to no improvment. In fact, in two corner case scenarios, I even > observed a regression, which I however consider acceptable: > > 1. On slow host disks in a write through cache mode, when the guest is > using virtio-blk in a separate iothread so that polling can be > enabled, and each completion is quickly followed up with a new > request (so that polling gets it), it can happen that enabling FUA > makes things slower - the additional very fast no-op flush we used to > have gave the adaptive polling algorithm a success so that it kept > polling. Without it, we only have the slow write request, which > disables polling. This is a problem in the polling algorithm that > will be fixed later in this series. > > 2. With a high queue depth, it can be beneficial to have flush requests > for another reason: The optimisation in bdrv_co_flush() that flushes > only once per write generation acts as a synchronisation mechanism > that lets all requests complete at the same time. This can result in > better batching and if the disk is very fast (I only saw this with a > null_blk backend), this can make up for the overhead of the flush and > improve throughput. In theory, we could optionally introduce a > similar artificial latency in the normal completion path to achieve > the same kind of completion batching. This is not implemented in this > series. > > Compatibility is not a concern for io_uring, it has supported RWF_DSYNC > from the start. Linux AIO started supporting it in Linux 4.13 and libaio > 0.3.111. The kernel is not a problem for any supported build platform, > so it's not necessary to add runtime checks. However, openSUSE is still > stuck with an older libaio version that would break the build. We must > detect this at build time to avoid build failures. > > Signed-off-by: Kevin Wolf <kw...@redhat.com> > --- > include/block/raw-aio.h | 8 ++++++-- > block/file-posix.c | 26 ++++++++++++++++++-------- > block/io_uring.c | 13 ++++++++----- > block/linux-aio.c | 24 +++++++++++++++++++++--- > meson.build | 4 ++++ > 5 files changed, 57 insertions(+), 18 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
signature.asc
Description: PGP signature