[Devel] [PATCH RHEL9 COMMIT] drivers/vhost: use array to store workers

Konstantin Khorenko Wed, 16 Nov 2022 06:30:33 -0800

The commit is pushed to "branch-rh9-5.14.0-70.22.1.vz9.17.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh9-5.14.0-70.22.1.vz9.17.9
------>
commit 598789c9751ad68eaff267939412115deff5fa66
Author: Andrey Zhadchenko <andrey.zhadche...@virtuozzo.com>
Date:   Fri Nov 11 12:55:48 2022 +0300


    drivers/vhost: use array to store workers
    
    We want to support several vhost workers. The first step is to
    rework vhost to use array of workers rather than single pointer.
    Update creation and cleanup routines.
    
    https://jira.sw.ru/browse/PSBM-139414
    Signed-off-by: Andrey Zhadchenko <andrey.zhadche...@virtuozzo.com>
    
    ======
    Patchset description:
    vhost-blk: in-kernel accelerator for virtio-blk guests
    
    Although QEMU virtio-blk is quite fast, there is still some room for
    improvements. Disk latency can be reduced if we handle virito-blk
    requests in host kernel so we avoid a lot of syscalls and context
    switches.
    The idea is quite simple - QEMU gives us block device and we translate
    any incoming virtio requests into bio and push them into bdev.
    The biggest disadvantage of this vhost-blk flavor is raw format.
    
    Luckily Kirill Thai proposed device mapper driver for QCOW2 format to
    attach files as block devices:
    https://www.spinics.net/lists/kernel/msg4292965.html
    
    Also by using kernel modules we can bypass iothread limitation and
    finaly scale block requests with cpus for high-performance devices.
    
    There have already been several attempts to write vhost-blk:
    
    Asias'   version:       https://lkml.org/lkml/2012/12/1/174
    Badari's version:       https://lwn.net/Articles/379864/
    Vitaly's version:       https://lwn.net/Articles/770965/
    
    The main difference between them is API to access backend file. The
    fastest one is Asias's version with bio flavor. It is also the most
    reviewed and have the most features. So vhost_blk module is partially
    based on it. Multiple virtqueue support was addded, some places
    reworked. Added support for several vhost workers.
    
    test setup and results:
      fio --direct=1 --rw=randread  --bs=4k  --ioengine=libaio --iodepth=128
    QEMU drive options: cache=none
    filesystem: xfs
    
    SSD:
                   | randread, IOPS  | randwrite, IOPS |
    Host           |      95.8k      |      85.3k      |
    QEMU virtio    |      57.5k      |      79.4k      |
    QEMU vhost-blk |      95.6k      |      84.3k      |
    
    RAMDISK (vq == vcpu):
                     | randread, IOPS | randwrite, IOPS |
    virtio, 1vcpu    |      123k      |      129k       |
    virtio, 2vcpu    |      253k (??) |      250k (??)  |
    virtio, 4vcpu    |      158k      |      154k       |
    vhost-blk, 1vcpu |      110k      |      113k       |
    vhost-blk, 2vcpu |      247k      |      252k       |
    vhost-blk, 8vcpu |      497k      |      469k       | *single kernel thread
    vhost-blk, 8vcpu |      730k      |      701k       | *two kernel threads
    
    v2:
    
    patch 1/10
     - removed unused VHOST_BLK_VQ
     - reworked bio handling a bit: now add all pages from signle iov into
       single bio istead of allocating one bio per page
     - changed how to calculate sector incrementation
     - check move_iovec() in vhost_blk_req_handle()
     - remove snprintf check and better check ret from copy_to_iter for
       VIRTIO_BLK_ID_BYTES requests
     - discard vq request if vhost_blk_req_handle() returned negative code
     - forbid to change nonzero backend in vhost_blk_set_backend(). First of
       all, QEMU sets backend only once. Also if we want to change backend
       when we already running requests we need to be much more careful in
       vhost_blk_handle_guest_kick() as it is not taking any references. If
       userspace want to change backend that bad it can always reset device.
     - removed EXPERIMENTAL from Kconfig
    
    patch 3/10
     - don't bother with checking dev->workers[0].worker since dev->nworkers
       will always contain 0 in this case
    
    patch 6/10
     - Make code do what docs suggest. Previously ioctl-supplied new number
       of workers were treated like an amount that should be added. Use new
       number as a ceiling instead and add workers up to that number.
    
    v3:
    patch 1/10
     - reworked bio handling a bit - now create new only if the previous is
       full
    
    patch 2/10
     - set vq->worker = NULL in vhost_vq_reset()
    
    v4:
    patch 1/10
     - vhost_blk_req_done() now won't hide errors for multi-bio requests
     - vhost_blk_prepare_req() now better estimates bio_len
     - alloc bio for max pages_nr_total pages instead of nr_pages
     - added new ioctl VHOST_BLK_SET_SERIAL to set serial
     - rework flush alghoritm a bit - now use two bins "new req" and
       "for flush" and swap them at the start of the flush
     - moved backing file dereference to vhost_blk_req_submit() and
       after request was added to flush bin to avoid race in
       vhost_blk_release().  Now even if we dropped backend and started
       flush the request will either be tracked by flush or be rolled back
    
    patch 2/10
     - moved vq->worker = NULL to patch #7 where this field is introduced.
    
    patch 7/10
     - Set vq->worker = NULL in vhost_vq_reset. This will fix both
       https://jira.sw.ru/browse/PSBM-142058
       https://jira.sw.ru/browse/PSBM-142852
    
    v5:
    patch 1/10
     - several codestyle/spacing fixes
     - added WARN_ON() for vhost_blk_flush
    
    https://jira.sw.ru/browse/PSBM-139414
    Reviewed-by: Pavel Tikhomirov <ptikhomi...@virtuozzo.com>
    
    Andrey Zhadchenko (10):
      drivers/vhost: vhost-blk accelerator for virtio-blk guests
      drivers/vhost: use array to store workers
      drivers/vhost: adjust vhost to flush all workers
      drivers/vhost: rework attaching cgroups to be worker aware
      drivers/vhost: rework worker creation
      drivers/vhost: add ioctl to increase the number of workers
      drivers/vhost: assign workers to virtqueues
      drivers/vhost: add API to queue work at virtqueue worker
      drivers/vhost: allow polls to be bound to workers via vqs
      drivers/vhost: queue vhost_blk works at vq workers
    
    Feature: vhost-blk: in-kernel accelerator for virtio-blk guests
---
 drivers/vhost/vhost.c | 75 ++++++++++++++++++++++++++++++++++++---------------
 drivers/vhost/vhost.h | 10 ++++++-
 2 files changed, 63 insertions(+), 22 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a0bfc77c6a43..968601325a37 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -231,11 +231,24 @@ void vhost_poll_stop(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_stop);
 
+static void vhost_work_queue_at_worker(struct vhost_worker *w,
+                                      struct vhost_work *work)
+{
+       if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) {
+               /* We can only add the work to the list after we're
+                * sure it was not in the list.
+                * test_and_set_bit() implies a memory barrier.
+                */
+               llist_add(&work->node, &w->work_list);
+               wake_up_process(w->worker);
+       }
+}
+
 void vhost_work_dev_flush(struct vhost_dev *dev)
 {
        struct vhost_flush_struct flush;
 
-       if (dev->worker) {
+       if (dev->workers[0].worker) {
                init_completion(&flush.wait_event);
                vhost_work_init(&flush.work, vhost_flush_work);
 
@@ -255,17 +268,12 @@ EXPORT_SYMBOL_GPL(vhost_poll_flush);
 
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 {
-       if (!dev->worker)
+       struct vhost_worker *w = &dev->workers[0];
+
+       if (!w->worker)
                return;
 
-       if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) {
-               /* We can only add the work to the list after we're
-                * sure it was not in the list.
-                * test_and_set_bit() implies a memory barrier.
-                */
-               llist_add(&work->node, &dev->work_list);
-               wake_up_process(dev->worker);
-       }
+       vhost_work_queue_at_worker(w, work);
 }
 EXPORT_SYMBOL_GPL(vhost_work_queue);
 
@@ -341,9 +349,29 @@ static void vhost_vq_reset(struct vhost_dev *dev,
        __vhost_vq_meta_reset(vq);
 }
 
+static void vhost_worker_reset(struct vhost_worker *w)
+{
+       init_llist_head(&w->work_list);
+       w->worker = NULL;
+}
+
+void vhost_cleanup_workers(struct vhost_dev *dev)
+{
+       int i;
+
+       for (i = 0; i < dev->nworkers; ++i) {
+               WARN_ON(!llist_empty(&dev->workers[i].work_list));
+               kthread_stop(dev->workers[i].worker);
+               vhost_worker_reset(&dev->workers[i]);
+       }
+
+       dev->nworkers = 0;
+}
+
 static int vhost_worker(void *data)
 {
-       struct vhost_dev *dev = data;
+       struct vhost_worker *w = data;
+       struct vhost_dev *dev = w->dev;
        struct vhost_work *work, *work_next;
        struct llist_node *node;
 
@@ -358,7 +386,7 @@ static int vhost_worker(void *data)
                        break;
                }
 
-               node = llist_del_all(&dev->work_list);
+               node = llist_del_all(&w->work_list);
                if (!node)
                        schedule();
 
@@ -481,7 +509,6 @@ void vhost_dev_init(struct vhost_dev *dev,
        dev->umem = NULL;
        dev->iotlb = NULL;
        dev->mm = NULL;
-       dev->worker = NULL;
        dev->iov_limit = iov_limit;
        dev->weight = weight;
        dev->byte_weight = byte_weight;
@@ -493,6 +520,11 @@ void vhost_dev_init(struct vhost_dev *dev,
        INIT_LIST_HEAD(&dev->pending_list);
        spin_lock_init(&dev->iotlb_lock);
 
+       dev->nworkers = 0;
+       for (i = 0; i < VHOST_MAX_WORKERS; ++i) {
+               dev->workers[i].dev = dev;
+               vhost_worker_reset(&dev->workers[i]);
+       }
 
        for (i = 0; i < dev->nvqs; ++i) {
                vq = dev->vqs[i];
@@ -602,7 +634,8 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
                        goto err_worker;
                }
 
-               dev->worker = worker;
+               dev->workers[0].worker = worker;
+               dev->nworkers = 1;
                wake_up_process(worker); /* avoid contributing to loadavg */
 
                err = vhost_attach_cgroups(dev);
@@ -616,9 +649,10 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 
        return 0;
 err_cgroup:
-       if (dev->worker) {
-               kthread_stop(dev->worker);
-               dev->worker = NULL;
+       dev->nworkers = 0;
+       if (dev->workers[0].worker) {
+               kthread_stop(dev->workers[0].worker);
+               dev->workers[0].worker = NULL;
        }
 err_worker:
        vhost_detach_mm(dev);
@@ -701,6 +735,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
                        eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
                vhost_vq_reset(dev, dev->vqs[i]);
        }
+
        vhost_dev_free_iovecs(dev);
        if (dev->log_ctx)
                eventfd_ctx_put(dev->log_ctx);
@@ -712,10 +747,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
        dev->iotlb = NULL;
        vhost_clear_msg(dev);
        wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
-       WARN_ON(!llist_empty(&dev->work_list));
-       if (dev->worker) {
-               kthread_stop(dev->worker);
-               dev->worker = NULL;
+       if (dev->use_worker) {
+               vhost_cleanup_workers(dev);
                dev->kcov_handle = 0;
        }
        vhost_detach_mm(dev);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 638bb640d6b4..634ea828cbba 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -25,6 +25,13 @@ struct vhost_work {
        unsigned long           flags;
 };
 
+#define VHOST_MAX_WORKERS 4
+struct vhost_worker {
+       struct task_struct *worker;
+       struct llist_head work_list;
+       struct vhost_dev *dev;
+};
+
 /* Poll a file (eventfd or socket) */
 /* Note: there's nothing vhost specific about this structure. */
 struct vhost_poll {
@@ -149,7 +156,8 @@ struct vhost_dev {
        int nvqs;
        struct eventfd_ctx *log_ctx;
        struct llist_head work_list;
-       struct task_struct *worker;
+       struct vhost_worker workers[VHOST_MAX_WORKERS];
+       int nworkers;
        struct vhost_iotlb *umem;
        struct vhost_iotlb *iotlb;
        spinlock_t iotlb_lock;
_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL9 COMMIT] drivers/vhost: use array to store workers

Reply via email to