Re: [PATCH v3 0/8] vdpa: Send all CVQ state load commands in parallel

Hawkins Jiawei Thu, 20 Jul 2023 03:33:39 -0700

在 2023/7/20 16:53, Lei Yang 写道:
> According to the Hawkins provided steps, I tested two cases based on
> applied this series patches and without it. And all tests are based on
> the real hardware.
> Case 1, without  this series
> Source: qemu-system-x86_64: vhost_vdpa_net_load() = 23308 us
> Dest: qemu-system-x86_64: vhost_vdpa_net_load() = 23296 us
>
> Case 2, applied  this series
> Source: qemu-system-x86_64: vhost_vdpa_net_load() = 6558 us
> Dest: qemu-system-x86_64: vhost_vdpa_net_load() = 6539 us
>
> Tested-by: Lei Yang <leiy...@redhat.com>
>
>
> On Thu, Jul 20, 2023 at 6:54 AM Lei Yang <leiy...@redhat.com> wrote:
>>
>> On Wed, Jul 19, 2023 at 11:25 PM Hawkins Jiawei <yin31...@gmail.com> wrote:
>>>
>>> 在 2023/7/19 20:44, Lei Yang 写道:
>>>> Hello Hawkins and Michael
>>>>
>>>> Looks like there are big changes about vp_vdpa, therefore, if needed,
>>>> QE can test this series in QE's environment before the patch is
>>>
>>> Hi Lei,
>>>
>>> This patch series does not modify the code of vp_vdpa. Instead, it only
>>> modifies how QEMU sends SVQ control commands to the vdpa device.
>>>
>> Hi Hawkins
>>
>>> Considering that the behavior of the vp_vdpa device differs from that
>>> of real vdpa hardware, would it be possible for you to test this patch
>>> series on a real vdpa device?
>>
>> Yes, there is a hardware device to test it , I will update the test
>> results ASAP.
>>
>> BR
>> Lei
>>>
>>> Thanks!
>>>
>>>
>>>> merged, and provide the result.
>>>>
>>>> BR
>>>> Lei
>>>>
>>>>
>>>> On Wed, Jul 19, 2023 at 8:37 PM Hawkins Jiawei <yin31...@gmail.com> wrote:
>>>>>
>>>>> 在 2023/7/19 17:11, Michael S. Tsirkin 写道:
>>>>>> On Wed, Jul 19, 2023 at 03:53:45PM +0800, Hawkins Jiawei wrote:
>>>>>>> This patchset allows QEMU to delay polling and checking the device
>>>>>>> used buffer until either the SVQ is full or control commands shadow
>>>>>>> buffers are full, instead of polling and checking immediately after
>>>>>>> sending each SVQ control command, so that QEMU can send all the SVQ
>>>>>>> control commands in parallel, which have better performance improvement.
>>>>>>>
>>>>>>> I use vp_vdpa device to simulate vdpa device, and create 4094 VLANS in
>>>>>>> guest to build a test environment for sending multiple CVQ state load
>>>>>>> commands. This patch series can improve latency from 10023 us to
>>>>>>> 8697 us for about 4099 CVQ state load commands, about 0.32 us per 
>>>>>>> command.


It appears that the performance timing with the vp_vdpa device in this
method is not consistently stable.

I retest this patch series with the same steps with the vp_vdpa device.

With this patch series, in the majority of cases, the time for CVQ state
load commands is around 8000 us ~ 10000 us.

Without this patch series, in the majority of cases, the time for CVQ
state load commands is around 14000 us ~ 20000 us.

Thanks!


>>>>>>
>>>>>> Looks like a tiny improvement.
>>>>>> At the same time we have O(n^2) behaviour with memory mappings.
>>>>>
>>>>> Hi Michael,
>>>>>
>>>>> Thanks for your review.
>>>>>
>>>>> I wonder why you say "we have O(n^2) behaviour on memory mappings" here?
>>>>>
>>>>>    From my understanding, QEMU maps two page-size buffers as control
>>>>> commands shadow buffers at device startup. These buffers then are used
>>>>> to cache SVQ control commands, where QEMU fills them with multiple SVQ 
>>>>> control
>>>>> commands bytes, flushes them when SVQ descriptors are full or these
>>>>> control commands shadow buffers reach their capacity.
>>>>>
>>>>> QEMU repeats this process until all CVQ state load commands have been
>>>>> sent in loading.
>>>>>
>>>>> In this loading process, only control commands shadow buffers
>>>>> translation should be relative to memory mappings, which should be
>>>>> O(log n) behaviour to my understanding(Please correct me if I am wrong).
>>>>>
>>>>>> Not saying we must not do this but I think it's worth
>>>>>> checking where the bottleneck is. My guess would be
>>>>>> vp_vdpa is not doing things in parallel. Want to try fixing that
>>>>>
>>>>> As for "vp_vdpa is not doing things in parallel.", do you mean
>>>>> the vp_vdpa device cannot process QEMU's SVQ control commands
>>>>> in parallel?
>>>>>
>>>>> In this situation, I will try to use real vdpa hardware to
>>>>> test the patch series performance.
>>>>>
>>>>>> to see how far it can be pushed?
>>>>>
>>>>> Currently, I am involved in the "Add virtio-net Control Virtqueue state
>>>>> restore support" project in Google Summer of Code now. Because I am
>>>>> uncertain about the time it will take to fix that problem in the vp_vdpa
>>>>> device, I prefer to complete the gsoc project first.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> Note that this patch should be based on
>>>>>>> patch "Vhost-vdpa Shadow Virtqueue VLAN support" at [1].
>>>>>>>
>>>>>>> [1]. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg03719.html
>>>>>>>
>>>>>>> TestStep
>>>>>>> ========
>>>>>>> 1. regression testing using vp-vdpa device
>>>>>>>      - For L0 guest, boot QEMU with two virtio-net-pci net device with
>>>>>>> `ctrl_vq`, `ctrl_rx`, `ctrl_rx_extra` features on, command line like:
>>>>>>>          -device virtio-net-pci,disable-legacy=on,disable-modern=off,
>>>>>>> iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off,
>>>>>>> indirect_desc=off,queue_reset=off,ctrl_rx=on,ctrl_rx_extra=on,...
>>>>>>>
>>>>>>>      - For L1 guest, apply the patch series and compile the source code,
>>>>>>> start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`,
>>>>>>> `ctrl_rx`, `ctrl_rx_extra` features on, command line like:
>>>>>>>          -netdev type=vhost-vdpa,x-svq=true,...
>>>>>>>          -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on,
>>>>>>> ctrl_rx=on,ctrl_rx_extra=on...
>>>>>>>
>>>>>>>      - For L2 source guest, run the following bash command:
>>>>>>> ```bash
>>>>>>> #!/bin/sh
>>>>>>>
>>>>>>> for idx1 in {0..9}
>>>>>>> do
>>>>>>>      for idx2 in {0..9}
>>>>>>>      do
>>>>>>>        for idx3 in {0..6}
>>>>>>>        do
>>>>>>>          ip link add macvlan$idx1$idx2$idx3 link eth0
>>>>>>> address 4a:30:10:19:$idx1$idx2:1$idx3 type macvlan mode bridge
>>>>>>>          ip link set macvlan$idx1$idx2$idx3 up
>>>>>>>        done
>>>>>>>      done
>>>>>>> done
>>>>>>> ```
>>>>>>>      - Execute the live migration in L2 source monitor
>>>>>>>
>>>>>>>      - Result
>>>>>>>        * with this series, QEMU should not trigger any error or warning.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2. perf using vp-vdpa device
>>>>>>>      - For L0 guest, boot QEMU with two virtio-net-pci net device with
>>>>>>> `ctrl_vq`, `ctrl_vlan` features on, command line like:
>>>>>>>          -device virtio-net-pci,disable-legacy=on,disable-modern=off,
>>>>>>> iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off,
>>>>>>> indirect_desc=off,queue_reset=off,ctrl_vlan=on,...
>>>>>>>
>>>>>>>      - For L1 guest, apply the patch series, then apply an addtional
>>>>>>> patch to record the load time in microseconds as following:
>>>>>>> ```diff
>>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>>>>> index 6b958d6363..501b510fd2 100644
>>>>>>> --- a/hw/net/vhost_net.c
>>>>>>> +++ b/hw/net/vhost_net.c
>>>>>>> @@ -295,7 +295,10 @@ static int vhost_net_start_one(struct vhost_net 
>>>>>>> *net,
>>>>>>>         }
>>>>>>>
>>>>>>>         if (net->nc->info->load) {
>>>>>>> +        int64_t start_us = g_get_monotonic_time();
>>>>>>>             r = net->nc->info->load(net->nc);
>>>>>>> +        error_report("vhost_vdpa_net_load() = %ld us",
>>>>>>> +                     g_get_monotonic_time() - start_us);
>>>>>>>             if (r < 0) {
>>>>>>>                 goto fail;
>>>>>>>             }
>>>>>>> ```
>>>>>>>
>>>>>>>      - For L1 guest, compile the code, and start QEMU with two vdpa 
>>>>>>> device
>>>>>>> with svq mode on, enable the `ctrl_vq`, `ctrl_vlan` features on,
>>>>>>> command line like:
>>>>>>>          -netdev type=vhost-vdpa,x-svq=true,...
>>>>>>>          -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on,
>>>>>>> ctrl_vlan=on...
>>>>>>>
>>>>>>>      - For L2 source guest, run the following bash command:
>>>>>>> ```bash
>>>>>>> #!/bin/sh
>>>>>>>
>>>>>>> for idx in {1..4094}
>>>>>>> do
>>>>>>>      ip link add link eth0 name vlan$idx type vlan id $idx
>>>>>>> done
>>>>>>> ```
>>>>>>>
>>>>>>>      - wait for some time, then execute the live migration in L2 source 
>>>>>>> monitor
>>>>>>>
>>>>>>>      - Result
>>>>>>>        * with this series, QEMU should not trigger any warning
>>>>>>> or error except something like "vhost_vdpa_net_load() = 8697 us"
>>>>>>>        * without this series, QEMU should not trigger any warning
>>>>>>> or error except something like "vhost_vdpa_net_load() = 10023 us"
>>>>>>>
>>>>>>> ChangeLog
>>>>>>> =========
>>>>>>> v3:
>>>>>>>      - refactor vhost_svq_poll() to accept cmds_in_flight
>>>>>>> suggested by Jason and Eugenio
>>>>>>>      - refactor vhost_vdpa_net_cvq_add() to make control commands 
>>>>>>> buffers
>>>>>>> is not tied to `s->cvq_cmd_out_buffer` and `s->status`, so we can reuse
>>>>>>> it suggested by Eugenio
>>>>>>>      - poll and check when SVQ is full or control commands shadow 
>>>>>>> buffers is
>>>>>>> full
>>>>>>>
>>>>>>> v2: https://lore.kernel.org/all/cover.1683371965.git.yin31...@gmail.com/
>>>>>>>      - recover accidentally deleted rows
>>>>>>>      - remove extra newline
>>>>>>>      - refactor `need_poll_len` to `cmds_in_flight`
>>>>>>>      - return -EINVAL when vhost_svq_poll() return 0 or check
>>>>>>> on buffers written by device fails
>>>>>>>      - change the type of `in_cursor`, and refactor the
>>>>>>> code for updating cursor
>>>>>>>      - return directly when vhost_vdpa_net_load_{mac,mq}()
>>>>>>> returns a failure in vhost_vdpa_net_load()
>>>>>>>
>>>>>>> v1: https://lore.kernel.org/all/cover.1681732982.git.yin31...@gmail.com/
>>>>>>>
>>>>>>> Hawkins Jiawei (8):
>>>>>>>      vhost: Add argument to vhost_svq_poll()
>>>>>>>      vdpa: Use iovec for vhost_vdpa_net_cvq_add()
>>>>>>>      vhost: Expose vhost_svq_available_slots()
>>>>>>>      vdpa: Avoid using vhost_vdpa_net_load_*() outside
>>>>>>>        vhost_vdpa_net_load()
>>>>>>>      vdpa: Check device ack in vhost_vdpa_net_load_rx_mode()
>>>>>>>      vdpa: Move vhost_svq_poll() to the caller of 
>>>>>>> vhost_vdpa_net_cvq_add()
>>>>>>>      vdpa: Add cursors to vhost_vdpa_net_loadx()
>>>>>>>      vdpa: Send cvq state load commands in parallel
>>>>>>>
>>>>>>>     hw/virtio/vhost-shadow-virtqueue.c |  38 ++--
>>>>>>>     hw/virtio/vhost-shadow-virtqueue.h |   3 +-
>>>>>>>     net/vhost-vdpa.c                   | 354 
>>>>>>> ++++++++++++++++++-----------
>>>>>>>     3 files changed, 249 insertions(+), 146 deletions(-)
>>>>>>>
>>>>>>> --
>>>>>>> 2.25.1
>>>>>>
>>>>>
>>>>
>>>
>

Re: [PATCH v3 0/8] vdpa: Send all CVQ state load commands in parallel

Reply via email to