The commit is pushed to "branch-rh9-5.14.0-362.8.1.vz9.35.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh9-5.14.0-362.8.1.vz9.35.5 ------> commit e10e2fafaa6f14eb62ae7c83ad19e5e336b3e97a Author: Konstantin Khorenko <khore...@virtuozzo.com> Date: Thu Jan 4 20:02:10 2024 +0300
FD: vhost-blk: in-kernel accelerator for virtio-blk guests https://jira.sw.ru/browse/PSBM-139414 Signed-off-by: Konstantin Khorenko <khore...@virtuozzo.com> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests --- ...in_kernel_accelerator_for_virtio_blk_guests.rst | 85 ++++++++++++++++++++++ 1 file changed, 85 insertions(+) diff --git a/Documentation/Virtuozzo/FeatureDescriptions/vhost-blk-in_kernel_accelerator_for_virtio_blk_guests.rst b/Documentation/Virtuozzo/FeatureDescriptions/vhost-blk-in_kernel_accelerator_for_virtio_blk_guests.rst new file mode 100644 index 000000000000..5eba1b592a37 --- /dev/null +++ b/Documentation/Virtuozzo/FeatureDescriptions/vhost-blk-in_kernel_accelerator_for_virtio_blk_guests.rst @@ -0,0 +1,85 @@ +====================================================== +vhost-blk: in-kernel accelerator for virtio-blk guests +====================================================== + +Background: +=========== + +Right now each IO request from the guest goes in the following way: + +* guest kernel puts IO request into virtio queue +* guest kernel performs VM exit +* host (in the context of VCPU thread) kicks IOthread in QEMU via + ioevent fd and performs VM enter +* IOthread wakeups +* IO thread serves the request through + - VirtIO BLK driver + - QCOW2 format driver + - host kernel +* once the request is completed (again, wakeup of userspace process) one + should inject IRQ into the guest (one more context switch, syscall) + +This process in lengthy and it is not scalable by the amount of guest +CPUs. + +Disk latency can be reduced if we handle virtio-blk requests in Host +kernel (like it's done in VirtIO Net aka vhost_net module) so we avoid a +lot of syscalls and context switches. + +The main problem with this approach *was* the absence of the +thin-provisioned virtual disk in the kernel and inability to perform the +backup. + +The idea is quite simple - QEMU gives us block device and we translate +any incoming virtio requests into bio and push them into bdev. +The biggest disadvantage of this vhost-blk flavor is raw format. + +Luckily Kirill Thai proposed device mapper driver for QCOW2 format to +attach files as block devices: +https://www.spinics.net/lists/kernel/msg4292965.html + +Also by using kernel modules we can bypass iothread limitation and +finaly scale block requests with cpus for high-performance devices. + + +Implementation details: +======================= + +There have already been several attempts to write vhost-blk: + +- Asias' version: https://lkml.org/lkml/2012/12/1/174 +- Badari's version: https://lwn.net/Articles/379864/ +- Vitaly's version: https://lwn.net/Articles/770965/ + +The main difference between them is API to access backend file. The +fastest one is Asias's version with bio flavor. It is also the most +reviewed and have the most features. So vhost_blk module is partially +based on it. Multiple virtqueue support was addded, some places +reworked. Added support for several vhost workers. + +Test setup:: + + fio --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=128 + QEMU drive options: cache=none + filesystem: xfs + +Test results:: + + SSD: + | randread, IOPS | randwrite, IOPS | + Host | 95.8k | 85.3k | + QEMU virtio | 57.5k | 79.4k | + QEMU vhost-blk | 95.6k | 84.3k | + + RAMDISK (vq == vcpu): + | randread, IOPS | randwrite, IOPS | + virtio, 1vcpu | 123k | 129k | + virtio, 2vcpu | 253k (??) | 250k (??) | + virtio, 4vcpu | 158k | 154k | + vhost-blk, 1vcpu | 110k | 113k | + vhost-blk, 2vcpu | 247k | 252k | + vhost-blk, 8vcpu | 497k | 469k | single kernel thread + vhost-blk, 8vcpu | 730k | 701k | two kernel threads + + +https://jira.sw.ru/browse/PSBM-139414 _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel