The commit is pushed to "branch-rh9-5.14.0-70.22.1.vz9.17.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh9-5.14.0-70.22.1.vz9.17.9 ------> commit c5960b2c339abd671fc0bac6e57283615ff6f5cf Author: Andrey Zhadchenko <andrey.zhadche...@virtuozzo.com> Date: Fri Nov 11 12:55:55 2022 +0300
drivers/vhost: allow polls to be bound to workers via vqs Allow vhost polls to be associated with vqs so we can queue them on assigned workers. If polls are not associated with specific vqs queue them on the first worker. https://jira.sw.ru/browse/PSBM-139414 Signed-off-by: Andrey Zhadchenko <andrey.zhadche...@virtuozzo.com> ====== Patchset description: vhost-blk: in-kernel accelerator for virtio-blk guests Although QEMU virtio-blk is quite fast, there is still some room for improvements. Disk latency can be reduced if we handle virito-blk requests in host kernel so we avoid a lot of syscalls and context switches. The idea is quite simple - QEMU gives us block device and we translate any incoming virtio requests into bio and push them into bdev. The biggest disadvantage of this vhost-blk flavor is raw format. Luckily Kirill Thai proposed device mapper driver for QCOW2 format to attach files as block devices: https://www.spinics.net/lists/kernel/msg4292965.html Also by using kernel modules we can bypass iothread limitation and finaly scale block requests with cpus for high-performance devices. There have already been several attempts to write vhost-blk: Asias' version: https://lkml.org/lkml/2012/12/1/174 Badari's version: https://lwn.net/Articles/379864/ Vitaly's version: https://lwn.net/Articles/770965/ The main difference between them is API to access backend file. The fastest one is Asias's version with bio flavor. It is also the most reviewed and have the most features. So vhost_blk module is partially based on it. Multiple virtqueue support was addded, some places reworked. Added support for several vhost workers. test setup and results: fio --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=128 QEMU drive options: cache=none filesystem: xfs SSD: | randread, IOPS | randwrite, IOPS | Host | 95.8k | 85.3k | QEMU virtio | 57.5k | 79.4k | QEMU vhost-blk | 95.6k | 84.3k | RAMDISK (vq == vcpu): | randread, IOPS | randwrite, IOPS | virtio, 1vcpu | 123k | 129k | virtio, 2vcpu | 253k (??) | 250k (??) | virtio, 4vcpu | 158k | 154k | vhost-blk, 1vcpu | 110k | 113k | vhost-blk, 2vcpu | 247k | 252k | vhost-blk, 8vcpu | 497k | 469k | *single kernel thread vhost-blk, 8vcpu | 730k | 701k | *two kernel threads v2: patch 1/10 - removed unused VHOST_BLK_VQ - reworked bio handling a bit: now add all pages from signle iov into single bio istead of allocating one bio per page - changed how to calculate sector incrementation - check move_iovec() in vhost_blk_req_handle() - remove snprintf check and better check ret from copy_to_iter for VIRTIO_BLK_ID_BYTES requests - discard vq request if vhost_blk_req_handle() returned negative code - forbid to change nonzero backend in vhost_blk_set_backend(). First of all, QEMU sets backend only once. Also if we want to change backend when we already running requests we need to be much more careful in vhost_blk_handle_guest_kick() as it is not taking any references. If userspace want to change backend that bad it can always reset device. - removed EXPERIMENTAL from Kconfig patch 3/10 - don't bother with checking dev->workers[0].worker since dev->nworkers will always contain 0 in this case patch 6/10 - Make code do what docs suggest. Previously ioctl-supplied new number of workers were treated like an amount that should be added. Use new number as a ceiling instead and add workers up to that number. v3: patch 1/10 - reworked bio handling a bit - now create new only if the previous is full patch 2/10 - set vq->worker = NULL in vhost_vq_reset() v4: patch 1/10 - vhost_blk_req_done() now won't hide errors for multi-bio requests - vhost_blk_prepare_req() now better estimates bio_len - alloc bio for max pages_nr_total pages instead of nr_pages - added new ioctl VHOST_BLK_SET_SERIAL to set serial - rework flush alghoritm a bit - now use two bins "new req" and "for flush" and swap them at the start of the flush - moved backing file dereference to vhost_blk_req_submit() and after request was added to flush bin to avoid race in vhost_blk_release(). Now even if we dropped backend and started flush the request will either be tracked by flush or be rolled back patch 2/10 - moved vq->worker = NULL to patch #7 where this field is introduced. patch 7/10 - Set vq->worker = NULL in vhost_vq_reset. This will fix both https://jira.sw.ru/browse/PSBM-142058 https://jira.sw.ru/browse/PSBM-142852 v5: patch 1/10 - several codestyle/spacing fixes - added WARN_ON() for vhost_blk_flush https://jira.sw.ru/browse/PSBM-139414 Reviewed-by: Pavel Tikhomirov <ptikhomi...@virtuozzo.com> Andrey Zhadchenko (10): drivers/vhost: vhost-blk accelerator for virtio-blk guests drivers/vhost: use array to store workers drivers/vhost: adjust vhost to flush all workers drivers/vhost: rework attaching cgroups to be worker aware drivers/vhost: rework worker creation drivers/vhost: add ioctl to increase the number of workers drivers/vhost: assign workers to virtqueues drivers/vhost: add API to queue work at virtqueue worker drivers/vhost: allow polls to be bound to workers via vqs drivers/vhost: queue vhost_blk works at vq workers Feature: vhost-blk: in-kernel accelerator for virtio-blk guests --- drivers/vhost/vhost.c | 24 ++++++++++++++++-------- drivers/vhost/vhost.h | 4 +++- 2 files changed, 19 insertions(+), 9 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 9a5e5b4dc4f0..ac204d2384a0 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -170,7 +170,7 @@ static int vhost_poll_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, if (!(key_to_poll(key) & poll->mask)) return 0; - if (!poll->dev->use_worker) + if (!poll->vq->dev->use_worker) work->fn(work); else vhost_poll_queue(poll); @@ -185,19 +185,27 @@ void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn) } EXPORT_SYMBOL_GPL(vhost_work_init); -/* Init poll structure */ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn, __poll_t mask, struct vhost_dev *dev) +{ + vhost_poll_init_vq(poll, fn, mask, dev->vqs[0]); +} +EXPORT_SYMBOL_GPL(vhost_poll_init); + + +/* Init poll structure */ +void vhost_poll_init_vq(struct vhost_poll *poll, vhost_work_fn_t fn, + __poll_t mask, struct vhost_virtqueue *vq) { init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup); init_poll_funcptr(&poll->table, vhost_poll_func); poll->mask = mask; - poll->dev = dev; + poll->vq = vq; poll->wqh = NULL; vhost_work_init(&poll->work, fn); } -EXPORT_SYMBOL_GPL(vhost_poll_init); +EXPORT_SYMBOL_GPL(vhost_poll_init_vq); /* Start polling a file. We add ourselves to file's wait queue. The caller must * keep a reference to a file until after vhost_poll_stop is called. */ @@ -287,7 +295,7 @@ EXPORT_SYMBOL_GPL(vhost_work_flush_vq); * locks that are also used by the callback. */ void vhost_poll_flush(struct vhost_poll *poll) { - vhost_work_dev_flush(poll->dev); + vhost_work_flush_vq(poll->vq); } EXPORT_SYMBOL_GPL(vhost_poll_flush); @@ -322,7 +330,7 @@ EXPORT_SYMBOL_GPL(vhost_has_work); void vhost_poll_queue(struct vhost_poll *poll) { - vhost_work_queue(poll->dev, &poll->work); + vhost_work_queue_vq(poll->vq, &poll->work); } EXPORT_SYMBOL_GPL(vhost_poll_queue); @@ -572,8 +580,8 @@ void vhost_dev_init(struct vhost_dev *dev, mutex_init(&vq->mutex); vhost_vq_reset(dev, vq); if (vq->handle_kick) - vhost_poll_init(&vq->poll, vq->handle_kick, - EPOLLIN, dev); + vhost_poll_init_vq(&vq->poll, vq->handle_kick, + EPOLLIN, vq); } } EXPORT_SYMBOL_GPL(vhost_dev_init); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index dc7428c26cbe..4182fd7fceaf 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -40,7 +40,7 @@ struct vhost_poll { wait_queue_entry_t wait; struct vhost_work work; __poll_t mask; - struct vhost_dev *dev; + struct vhost_virtqueue *vq; }; void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn); @@ -49,6 +49,8 @@ bool vhost_has_work(struct vhost_dev *dev); void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn, __poll_t mask, struct vhost_dev *dev); +void vhost_poll_init_vq(struct vhost_poll *poll, vhost_work_fn_t fn, + __poll_t mask, struct vhost_virtqueue *vq); int vhost_poll_start(struct vhost_poll *poll, struct file *file); void vhost_poll_stop(struct vhost_poll *poll); void vhost_poll_flush(struct vhost_poll *poll); _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel