Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init

Lei Yang Thu, 20 Mar 2025 08:09:01 -0700

Hi Dragos, Si-Wei

1.  I applied [0] [1] [2] to the downstream kernel then tested
hotplug/unplug, this bug still exists.


[0] 35025963326e ("vdpa/mlx5: Fix suboptimal range on iotlb iteration")
[1] 29ce8b8a4fa7 ("vdpa/mlx5: Fix PA offset with unaligned starting iotlb map")
[2] a6097e0a54a5 ("vdpa/mlx5: Fix oversized null mkey longer than 32bit")

2. Si-Wei mentioned two patches [1] [2] have been merged into qemu
master branch, so based on the test result it can not help fix this
bug.
[1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
[2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")

3. I found triggers for the unhealthy report from firmware step is
just boot up guest when using the current patches qemu. The host dmesg
will print  unhealthy info immediately after booting up the guest.

Thanks
Lei


On Wed, Mar 19, 2025 at 8:14 AM Si-Wei Liu <si-wei....@oracle.com> wrote:
>
> Hi Lei,
>
> On 3/18/2025 7:06 AM, Lei Yang wrote:
> > On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <jasow...@redhat.com> wrote:
> >> On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiy...@redhat.com> wrote:
> >>> Hi Jonah
> >>>
> >>> I tested this series with the vhost_vdpa device based on mellanox
> >>> ConnectX-6 DX nic and hit the host kernel crash. This problem can be
> >>> easier to reproduce under the hotplug/unplug device scenario.
> >>> For the core dump messages please review the attachment.
> >>> FW version:
> >>> #  flint -d 0000:0d:00.0 q |grep Version
> >>> FW Version:            22.44.1036
> >>> Product Version:       22.44.1036
> >> The trace looks more like a mlx5e driver bug other than vDPA?
> >>
> >> [ 3256.256707] Call Trace:
> >> [ 3256.256708]  <IRQ>
> >> [ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
> >> [ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
> >> [ 3256.256715]  ? __build_skb+0x4a/0x60
> >> [ 3256.256719]  ? __die_body.cold+0x8/0xd
> >> [ 3256.256720]  ? die_addr+0x39/0x60
> >> [ 3256.256725]  ? exc_general_protection+0x1ec/0x420
> >> [ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
> >> [ 3256.256736]  ? __build_skb_around+0x8c/0xf0
> >> [ 3256.256738]  __build_skb+0x4a/0x60
> >> [ 3256.256740]  build_skb+0x11/0xa0
> >> [ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
> >> [ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
> >> [ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
> >> [ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
> >> [ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
> >> [ 3256.257226]  __napi_poll+0x29/0x170
> >> [ 3256.257229]  net_rx_action+0x29c/0x370
> >> [ 3256.257231]  handle_softirqs+0xce/0x270
> >> [ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
> >> [ 3256.257238]  common_interrupt+0x80/0xa0
> >>
> > Hi Jason
> >
> >> Which kernel tree did you use? Can you please try net.git?
> > I used the latest 9.6 downstream kernel and upstream qemu (applied
> > this series of patches) to test this scenario.
> > First based on my test result this bug is related to this series of
> > patches, the conclusions are based on the following test results(All
> > test results are based on the above mentioned nic driver):
> > Case 1: downstream kernel + downstream qemu-kvm  -  pass
> > Case 2: downstream kernel + upstream qemu (doesn't included this
> > series of patches)  -  pass
> > Case 3: downstream kernel + upstream qemu (included this series of
> > patches)  - failed, reproduce ratio 100%
> Just as Dragos replied earlier, the firmware was already in a bogus
> state before the panic that I also suspect it has something to do with
> various bugs in the downstream kernel. You have to apply the 3 patches
> to the downstream kernel before you may kick of the relevant tests
> again. Please pay special attention to which specific command or step
> that triggers the unhealthy report from firmware, and let us know if you
> still run into any of them.
>
> In addition, as you seem to be testing the device hot plug and unplug
> use cases, for which the latest qemu should have related fixes
> below[1][2], but in case they are missed somehow it might also end up
> with bad firmware state to some extend. Just fyi.
>
> [1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
> [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")
>
> Thanks,
> -Siwei
> >
> > Then I also tried to test it with the net.git tree, but it will hit
> > the host kernel panic after compiling when rebooting the host. For the
> > call trace info please review following messages:
> > [    9.902851] No filesystem could mount root, tried:
> > [    9.902851]
> > [    9.909248] Kernel panic - not syncing: VFS: Unable to mount root
> > fs on "/dev/mapper/rhel_dell--per760--12-root" or unknown-block(0,0)
> > [    9.921335] CPU: 16 UID: 0 PID: 1 Comm: swapper/0 Not tainted 
> > 6.14.0-rc6+ #3
> > [    9.928398] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS
> > 1.3.2 03/28/2023
> > [    9.935876] Call Trace:
> > [    9.938332]  <TASK>
> > [    9.940436]  panic+0x356/0x380
> > [    9.943513]  mount_root_generic+0x2e7/0x300
> > [    9.947717]  prepare_namespace+0x65/0x270
> > [    9.951731]  kernel_init_freeable+0x2e2/0x310
> > [    9.956105]  ? __pfx_kernel_init+0x10/0x10
> > [    9.960221]  kernel_init+0x16/0x1d0
> > [    9.963715]  ret_from_fork+0x2d/0x50
> > [    9.967303]  ? __pfx_kernel_init+0x10/0x10
> > [    9.971404]  ret_from_fork_asm+0x1a/0x30
> > [    9.975348]  </TASK>
> > [    9.977555] Kernel Offset: 0xc00000 from 0xffffffff81000000
> > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [   10.101881] ---[ end Kernel panic - not syncing: VFS: Unable to
> > mount root fs on "/dev/mapper/rhel_dell--per760--12-root" or
> > unknown-block(0,0) ]---
> >
> > # git log -1
> > commit 4003c9e78778e93188a09d6043a74f7154449d43 (HEAD -> main,
> > origin/main, origin/HEAD)
> > Merge: 8f7617f45009 2409fa66e29a
> > Author: Linus Torvalds <torva...@linux-foundation.org>
> > Date:   Thu Mar 13 07:58:48 2025 -1000
> >
> >      Merge tag 'net-6.14-rc7' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> >
> >
> > Thanks
> >
> > Lei
> >> Thanks
> >>
> >>> Best Regards
> >>> Lei
> >>>
> >>> On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.pal...@oracle.com> 
> >>> wrote:
> >>>> Current memory operations like pinning may take a lot of time at the
> >>>> destination.  Currently they are done after the source of the migration 
> >>>> is
> >>>> stopped, and before the workload is resumed at the destination.  This is 
> >>>> a
> >>>> period where neigher traffic can flow, nor the VM workload can continue
> >>>> (downtime).
> >>>>
> >>>> We can do better as we know the memory layout of the guest RAM at the
> >>>> destination from the moment that all devices are initializaed.  So
> >>>> moving that operation allows QEMU to communicate the kernel the maps
> >>>> while the workload is still running in the source, so Linux can start
> >>>> mapping them.
> >>>>
> >>>> As a small drawback, there is a time in the initialization where QEMU
> >>>> cannot respond to QMP etc.  By some testing, this time is about
> >>>> 0.2seconds.  This may be further reduced (or increased) depending on the
> >>>> vdpa driver and the platform hardware, and it is dominated by the cost
> >>>> of memory pinning.
> >>>>
> >>>> This matches the time that we move out of the called downtime window.
> >>>> The downtime is measured as checking the trace timestamp from the moment
> >>>> the source suspend the device to the moment the destination starts the
> >>>> eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
> >>>> secs to 2.0949.
> >>>>
> >>>> Future directions on top of this series may include to move more things 
> >>>> ahead
> >>>> of the migration time, like set DRIVER_OK or perform actual iterative 
> >>>> migration
> >>>> of virtio-net devices.
> >>>>
> >>>> Comments are welcome.
> >>>>
> >>>> This series is a different approach of series [1]. As the title does not
> >>>> reflect the changes anymore, please refer to the previous one to know the
> >>>> series history.
> >>>>
> >>>> This series is based on [2], it must be applied after it.
> >>>>
> >>>> [Jonah Palmer]
> >>>> This series was rebased after [3] was pulled in, as [3] was a 
> >>>> prerequisite
> >>>> fix for this series.
> >>>>
> >>>> v3:
> >>>> ---
> >>>> * Rebase
> >>>>
> >>>> v2:
> >>>> ---
> >>>> * Move the memory listener registration to vhost_vdpa_set_owner function.
> >>>> * Move the iova_tree allocation to net_vhost_vdpa_init.
> >>>>
> >>>> v1 at 
> >>>> https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
> >>>>
> >>>> [1] 
> >>>> https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-epere...@redhat.com/
> >>>> [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
> >>>> [3] 
> >>>> https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.pal...@oracle.com/
> >>>>
> >>>> Eugenio Pérez (7):
> >>>>    vdpa: check for iova tree initialized at net_client_start
> >>>>    vdpa: reorder vhost_vdpa_set_backend_cap
> >>>>    vdpa: set backend capabilities at vhost_vdpa_init
> >>>>    vdpa: add listener_registered
> >>>>    vdpa: reorder listener assignment
> >>>>    vdpa: move iova_tree allocation to net_vhost_vdpa_init
> >>>>    vdpa: move memory listener register to vhost_vdpa_init
> >>>>
> >>>>   hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
> >>>>   include/hw/virtio/vhost-vdpa.h | 22 +++++++-
> >>>>   net/vhost-vdpa.c               | 34 ++----------
> >>>>   3 files changed, 88 insertions(+), 66 deletions(-)
> >>>>
> >>>> --
> >>>> 2.43.5
> >>>>
> >>>>
>

Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init

Reply via email to