On Fri, Mar 07, 2025 at 11:16:30PM +0100, Kevin Wolf wrote:
> Until now, FUA was always emulated with a separate flush after the write
> for file-posix. The overhead of processing a second request can reduce
> performance significantly for a guest disk that has disabled the write
> cache, especially if the host disk is already write through, too, and
> the flush isn't actually doing anything.
> 
> Advertise support for REQ_FUA in write requests and implement it for
> Linux AIO and io_uring using the RWF_DSYNC flag for write requests. The
> thread pool still performs a separate fdatasync() call. This can be
> improved later by using the pwritev2() syscall if available.
> 
> As an example, this is how fio numbers can be improved in some scenarios
> with this patch (all using virtio-blk with cache=directsync on an nvme
> block device for the VM, fio with ioengine=libaio,direct=1,sync=1):
> 
>                               | old           | with FUA support
> ------------------------------+---------------+-------------------
> bs=4k, iodepth=1, numjobs=1   |  45.6k iops   |  56.1k iops
> bs=4k, iodepth=1, numjobs=16  | 183.3k iops   | 236.0k iops
> bs=4k, iodepth=16, numjobs=1  | 258.4k iops   | 311.1k iops
> 
> However, not all scenarios are clear wins. On another slower disk I saw
> little to no improvment. In fact, in two corner case scenarios, I even
> observed a regression, which I however consider acceptable:
> 
> 1. On slow host disks in a write through cache mode, when the guest is
>    using virtio-blk in a separate iothread so that polling can be
>    enabled, and each completion is quickly followed up with a new
>    request (so that polling gets it), it can happen that enabling FUA
>    makes things slower - the additional very fast no-op flush we used to
>    have gave the adaptive polling algorithm a success so that it kept
>    polling. Without it, we only have the slow write request, which
>    disables polling. This is a problem in the polling algorithm that
>    will be fixed later in this series.
> 
> 2. With a high queue depth, it can be beneficial to have flush requests
>    for another reason: The optimisation in bdrv_co_flush() that flushes
>    only once per write generation acts as a synchronisation mechanism
>    that lets all requests complete at the same time. This can result in
>    better batching and if the disk is very fast (I only saw this with a
>    null_blk backend), this can make up for the overhead of the flush and
>    improve throughput. In theory, we could optionally introduce a
>    similar artificial latency in the normal completion path to achieve
>    the same kind of completion batching. This is not implemented in this
>    series.
> 
> Compatibility is not a concern for io_uring, it has supported RWF_DSYNC
> from the start. Linux AIO started supporting it in Linux 4.13 and libaio
> 0.3.111. The kernel is not a problem for any supported build platform,
> so it's not necessary to add runtime checks. However, openSUSE is still
> stuck with an older libaio version that would break the build. We must
> detect this at build time to avoid build failures.
> 
> Signed-off-by: Kevin Wolf <kw...@redhat.com>
> ---
>  include/block/raw-aio.h |  8 ++++++--
>  block/file-posix.c      | 26 ++++++++++++++++++--------
>  block/io_uring.c        | 13 ++++++++-----
>  block/linux-aio.c       | 24 +++++++++++++++++++++---
>  meson.build             |  4 ++++
>  5 files changed, 57 insertions(+), 18 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>

Attachment: signature.asc
Description: PGP signature

Reply via email to