On Tue, Mar 25, 2025 at 01:48:35PM +0100, Kevin Wolf wrote:
> Am 06.03.2025 um 11:33 hat Kevin Wolf geschrieben:
> > Am 04.03.2025 um 16:52 hat Alberto Faria geschrieben:
> > > Avoid emulating FUA when the driver supports it natively. This should
> > > provide better performance than a full flush after the write.
> > > 
> > > Signed-off-by: Alberto Faria <afa...@redhat.com>
> > 
> > Did you try out if you can see performance improvements in practice?
> > It's always nice to have numbers in the commit message for patches that
> > promise performance improvements.
> 
> I was curious enough to see how this and the recent series by Stefan
> (virtio-scsi multiqueue) and myself (FUA on the backend + polling
> improvements) play out with virtio-scsi, so I just ran some fio
> benchmarks with sync=1 myself to compare:
> 
> iops bs=4k cache=none           |    virtio-scsi    |     virtio-blk    |
> O_SYNC workload                 |   qd 1  |  qd 16  |   qd 1  |  qd 16  |
> --------------------------------+---------+---------+---------+---------+
> master                          |   21296 |  109747 |   25762 |  130576 |
> + virtio-scsi multiqueue        |   28798 |  121170 |       - |       - |
> + FUA in scsi-disk              |   51893 |  204199 |       - |       - |
> --------------------------------+---------+---------+---------+---------+
> Total change                    | +143.7% |  +86.1% |       - |       - |
> 
> (No new numbers for virtio-blk because virtio-scsi patches obviously
> don't change anything about it. Also no numbers for FUA in file-posix
> because it's unused with cache=none.)
> 
> iops bs=4k cache=directsync     |    virtio-scsi    |     virtio-blk    |
> O_SYNC workload                 |   qd 1  |  qd 16  |   qd 1  |  qd 16  |
> --------------------------------+---------+---------+---------+---------+
> master                          |   32223 |  109748 |   45583 |  258416 |
> + FUA in file-posix + polling   |   32148 |  198665 |   58601 |  320190 |
> + virtio-scsi multiqueue        |   51739 |  225031 |       - |       - |
> + FUA in scsi-disk              |   56061 |  227535 |       - |       - |
> --------------------------------+---------+---------+---------+---------+
> Total change                    |  +74.0% | +107.3% |  +28.6% |  +23.9% |
> 
> Of course, the huge improvements on the virtio-scsi side only show how
> bad it was before. In most numbers it is still behind virtio-blk even
> after all three patch series (apart from cache=none where the
> availability of FUA on the device side makes a big difference, and I
> expect that virtio-blk will improve similarly once we implement it
> there).
> 
> Also note that when testing the virtio-scsi multiqueue patches, this
> was still a single iothread, i.e. I wasn't even making use of the new
> feature per se. I assume much of this comes from enabling polling
> because the series moved the event queue handling to the main loop,
> which prevented polling for virtio-scsi before. The series also got rid
> of an extra coroutine per request for the blk_is_available() call in
> virtio_scsi_ctx_check(), which might play a role, too.
> 
> Anyway, I like these numbers for FUA in scsi-disk. It makes write back
> cache modes almost catch up to write through with O_SYNC workloads. We
> should definitely get this merged and do the same for virtio-blk.

Thanks for sharing! Nice IOPS improvements across the board.

Stefan

Attachment: signature.asc
Description: PGP signature

Reply via email to