On Tue, Mar 25, 2025 at 01:48:35PM +0100, Kevin Wolf wrote: > Am 06.03.2025 um 11:33 hat Kevin Wolf geschrieben: > > Am 04.03.2025 um 16:52 hat Alberto Faria geschrieben: > > > Avoid emulating FUA when the driver supports it natively. This should > > > provide better performance than a full flush after the write. > > > > > > Signed-off-by: Alberto Faria <afa...@redhat.com> > > > > Did you try out if you can see performance improvements in practice? > > It's always nice to have numbers in the commit message for patches that > > promise performance improvements. > > I was curious enough to see how this and the recent series by Stefan > (virtio-scsi multiqueue) and myself (FUA on the backend + polling > improvements) play out with virtio-scsi, so I just ran some fio > benchmarks with sync=1 myself to compare: > > iops bs=4k cache=none | virtio-scsi | virtio-blk | > O_SYNC workload | qd 1 | qd 16 | qd 1 | qd 16 | > --------------------------------+---------+---------+---------+---------+ > master | 21296 | 109747 | 25762 | 130576 | > + virtio-scsi multiqueue | 28798 | 121170 | - | - | > + FUA in scsi-disk | 51893 | 204199 | - | - | > --------------------------------+---------+---------+---------+---------+ > Total change | +143.7% | +86.1% | - | - | > > (No new numbers for virtio-blk because virtio-scsi patches obviously > don't change anything about it. Also no numbers for FUA in file-posix > because it's unused with cache=none.) > > iops bs=4k cache=directsync | virtio-scsi | virtio-blk | > O_SYNC workload | qd 1 | qd 16 | qd 1 | qd 16 | > --------------------------------+---------+---------+---------+---------+ > master | 32223 | 109748 | 45583 | 258416 | > + FUA in file-posix + polling | 32148 | 198665 | 58601 | 320190 | > + virtio-scsi multiqueue | 51739 | 225031 | - | - | > + FUA in scsi-disk | 56061 | 227535 | - | - | > --------------------------------+---------+---------+---------+---------+ > Total change | +74.0% | +107.3% | +28.6% | +23.9% | > > Of course, the huge improvements on the virtio-scsi side only show how > bad it was before. In most numbers it is still behind virtio-blk even > after all three patch series (apart from cache=none where the > availability of FUA on the device side makes a big difference, and I > expect that virtio-blk will improve similarly once we implement it > there). > > Also note that when testing the virtio-scsi multiqueue patches, this > was still a single iothread, i.e. I wasn't even making use of the new > feature per se. I assume much of this comes from enabling polling > because the series moved the event queue handling to the main loop, > which prevented polling for virtio-scsi before. The series also got rid > of an extra coroutine per request for the blk_is_available() call in > virtio_scsi_ctx_check(), which might play a role, too. > > Anyway, I like these numbers for FUA in scsi-disk. It makes write back > cache modes almost catch up to write through with O_SYNC workloads. We > should definitely get this merged and do the same for virtio-blk.
Thanks for sharing! Nice IOPS improvements across the board. Stefan
signature.asc
Description: PGP signature