On Thu, Mar 20, 2025 at 11:48 PM Dragos Tatulea <dtatu...@nvidia.com> wrote: > > Hi Lei, > > On 03/20, Lei Yang wrote: > > Hi Dragos, Si-Wei > > > > 1. I applied [0] [1] [2] to the downstream kernel then tested > > hotplug/unplug, this bug still exists. > > > > [0] 35025963326e ("vdpa/mlx5: Fix suboptimal range on iotlb iteration") > > [1] 29ce8b8a4fa7 ("vdpa/mlx5: Fix PA offset with unaligned starting iotlb > > map") > > [2] a6097e0a54a5 ("vdpa/mlx5: Fix oversized null mkey longer than 32bit") > > > > 2. Si-Wei mentioned two patches [1] [2] have been merged into qemu > > master branch, so based on the test result it can not help fix this > > bug. > > [1] db0d4017f9b9 ("net: parameterize the removing client from nc list") > > [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup") > > > > 3. I found triggers for the unhealthy report from firmware step is > > just boot up guest when using the current patches qemu. The host dmesg > > will print unhealthy info immediately after booting up the guest. > >
Hi Dragos > Did you set the locked memory to ulimite before (ulimit -l unlimited)? > This could also be the cause for the FW issue. Yes, I did it. I executed it (ulimit -l unlimited) before I boot up the guest. Thanks Lei > > Thanks, > Dragos > > > Thanks > > Lei > > > > > > On Wed, Mar 19, 2025 at 8:14 AM Si-Wei Liu <si-wei....@oracle.com> wrote: > > > > > > Hi Lei, > > > > > > On 3/18/2025 7:06 AM, Lei Yang wrote: > > > > On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <jasow...@redhat.com> wrote: > > > >> On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiy...@redhat.com> wrote: > > > >>> Hi Jonah > > > >>> > > > >>> I tested this series with the vhost_vdpa device based on mellanox > > > >>> ConnectX-6 DX nic and hit the host kernel crash. This problem can be > > > >>> easier to reproduce under the hotplug/unplug device scenario. > > > >>> For the core dump messages please review the attachment. > > > >>> FW version: > > > >>> # flint -d 0000:0d:00.0 q |grep Version > > > >>> FW Version: 22.44.1036 > > > >>> Product Version: 22.44.1036 > > > >> The trace looks more like a mlx5e driver bug other than vDPA? > > > >> > > > >> [ 3256.256707] Call Trace: > > > >> [ 3256.256708] <IRQ> > > > >> [ 3256.256709] ? show_trace_log_lvl+0x1c4/0x2df > > > >> [ 3256.256714] ? show_trace_log_lvl+0x1c4/0x2df > > > >> [ 3256.256715] ? __build_skb+0x4a/0x60 > > > >> [ 3256.256719] ? __die_body.cold+0x8/0xd > > > >> [ 3256.256720] ? die_addr+0x39/0x60 > > > >> [ 3256.256725] ? exc_general_protection+0x1ec/0x420 > > > >> [ 3256.256729] ? asm_exc_general_protection+0x22/0x30 > > > >> [ 3256.256736] ? __build_skb_around+0x8c/0xf0 > > > >> [ 3256.256738] __build_skb+0x4a/0x60 > > > >> [ 3256.256740] build_skb+0x11/0xa0 > > > >> [ 3256.256743] mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core] > > > >> [ 3256.256872] mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core] > > > >> [ 3256.256964] mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 > > > >> [mlx5_core] > > > >> [ 3256.257053] mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core] > > > >> [ 3256.257139] mlx5e_napi_poll+0xe2/0x710 [mlx5_core] > > > >> [ 3256.257226] __napi_poll+0x29/0x170 > > > >> [ 3256.257229] net_rx_action+0x29c/0x370 > > > >> [ 3256.257231] handle_softirqs+0xce/0x270 > > > >> [ 3256.257236] __irq_exit_rcu+0xa3/0xc0 > > > >> [ 3256.257238] common_interrupt+0x80/0xa0 > > > >> > > > > Hi Jason > > > > > > > >> Which kernel tree did you use? Can you please try net.git? > > > > I used the latest 9.6 downstream kernel and upstream qemu (applied > > > > this series of patches) to test this scenario. > > > > First based on my test result this bug is related to this series of > > > > patches, the conclusions are based on the following test results(All > > > > test results are based on the above mentioned nic driver): > > > > Case 1: downstream kernel + downstream qemu-kvm - pass > > > > Case 2: downstream kernel + upstream qemu (doesn't included this > > > > series of patches) - pass > > > > Case 3: downstream kernel + upstream qemu (included this series of > > > > patches) - failed, reproduce ratio 100% > > > Just as Dragos replied earlier, the firmware was already in a bogus > > > state before the panic that I also suspect it has something to do with > > > various bugs in the downstream kernel. You have to apply the 3 patches > > > to the downstream kernel before you may kick of the relevant tests > > > again. Please pay special attention to which specific command or step > > > that triggers the unhealthy report from firmware, and let us know if you > > > still run into any of them. > > > > > > In addition, as you seem to be testing the device hot plug and unplug > > > use cases, for which the latest qemu should have related fixes > > > below[1][2], but in case they are missed somehow it might also end up > > > with bad firmware state to some extend. Just fyi. > > > > > > [1] db0d4017f9b9 ("net: parameterize the removing client from nc list") > > > [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup") > > > > > > Thanks, > > > -Siwei > > > > > > > > Then I also tried to test it with the net.git tree, but it will hit > > > > the host kernel panic after compiling when rebooting the host. For the > > > > call trace info please review following messages: > > > > [ 9.902851] No filesystem could mount root, tried: > > > > [ 9.902851] > > > > [ 9.909248] Kernel panic - not syncing: VFS: Unable to mount root > > > > fs on "/dev/mapper/rhel_dell--per760--12-root" or unknown-block(0,0) > > > > [ 9.921335] CPU: 16 UID: 0 PID: 1 Comm: swapper/0 Not tainted > > > > 6.14.0-rc6+ #3 > > > > [ 9.928398] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS > > > > 1.3.2 03/28/2023 > > > > [ 9.935876] Call Trace: > > > > [ 9.938332] <TASK> > > > > [ 9.940436] panic+0x356/0x380 > > > > [ 9.943513] mount_root_generic+0x2e7/0x300 > > > > [ 9.947717] prepare_namespace+0x65/0x270 > > > > [ 9.951731] kernel_init_freeable+0x2e2/0x310 > > > > [ 9.956105] ? __pfx_kernel_init+0x10/0x10 > > > > [ 9.960221] kernel_init+0x16/0x1d0 > > > > [ 9.963715] ret_from_fork+0x2d/0x50 > > > > [ 9.967303] ? __pfx_kernel_init+0x10/0x10 > > > > [ 9.971404] ret_from_fork_asm+0x1a/0x30 > > > > [ 9.975348] </TASK> > > > > [ 9.977555] Kernel Offset: 0xc00000 from 0xffffffff81000000 > > > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > > > [ 10.101881] ---[ end Kernel panic - not syncing: VFS: Unable to > > > > mount root fs on "/dev/mapper/rhel_dell--per760--12-root" or > > > > unknown-block(0,0) ]--- > > > > > > > > # git log -1 > > > > commit 4003c9e78778e93188a09d6043a74f7154449d43 (HEAD -> main, > > > > origin/main, origin/HEAD) > > > > Merge: 8f7617f45009 2409fa66e29a > > > > Author: Linus Torvalds <torva...@linux-foundation.org> > > > > Date: Thu Mar 13 07:58:48 2025 -1000 > > > > > > > > Merge tag 'net-6.14-rc7' of > > > > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net > > > > > > > > > > > > Thanks > > > > > > > > Lei > > > >> Thanks > > > >> > > > >>> Best Regards > > > >>> Lei > > > >>> > > > >>> On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer > > > >>> <jonah.pal...@oracle.com> wrote: > > > >>>> Current memory operations like pinning may take a lot of time at the > > > >>>> destination. Currently they are done after the source of the > > > >>>> migration is > > > >>>> stopped, and before the workload is resumed at the destination. > > > >>>> This is a > > > >>>> period where neigher traffic can flow, nor the VM workload can > > > >>>> continue > > > >>>> (downtime). > > > >>>> > > > >>>> We can do better as we know the memory layout of the guest RAM at the > > > >>>> destination from the moment that all devices are initializaed. So > > > >>>> moving that operation allows QEMU to communicate the kernel the maps > > > >>>> while the workload is still running in the source, so Linux can start > > > >>>> mapping them. > > > >>>> > > > >>>> As a small drawback, there is a time in the initialization where QEMU > > > >>>> cannot respond to QMP etc. By some testing, this time is about > > > >>>> 0.2seconds. This may be further reduced (or increased) depending on > > > >>>> the > > > >>>> vdpa driver and the platform hardware, and it is dominated by the > > > >>>> cost > > > >>>> of memory pinning. > > > >>>> > > > >>>> This matches the time that we move out of the called downtime window. > > > >>>> The downtime is measured as checking the trace timestamp from the > > > >>>> moment > > > >>>> the source suspend the device to the moment the destination starts > > > >>>> the > > > >>>> eight and last virtqueue pair. For a 39G guest, it goes from ~2.2526 > > > >>>> secs to 2.0949. > > > >>>> > > > >>>> Future directions on top of this series may include to move more > > > >>>> things ahead > > > >>>> of the migration time, like set DRIVER_OK or perform actual > > > >>>> iterative migration > > > >>>> of virtio-net devices. > > > >>>> > > > >>>> Comments are welcome. > > > >>>> > > > >>>> This series is a different approach of series [1]. As the title does > > > >>>> not > > > >>>> reflect the changes anymore, please refer to the previous one to > > > >>>> know the > > > >>>> series history. > > > >>>> > > > >>>> This series is based on [2], it must be applied after it. > > > >>>> > > > >>>> [Jonah Palmer] > > > >>>> This series was rebased after [3] was pulled in, as [3] was a > > > >>>> prerequisite > > > >>>> fix for this series. > > > >>>> > > > >>>> v3: > > > >>>> --- > > > >>>> * Rebase > > > >>>> > > > >>>> v2: > > > >>>> --- > > > >>>> * Move the memory listener registration to vhost_vdpa_set_owner > > > >>>> function. > > > >>>> * Move the iova_tree allocation to net_vhost_vdpa_init. > > > >>>> > > > >>>> v1 at > > > >>>> https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html. > > > >>>> > > > >>>> [1] > > > >>>> https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-epere...@redhat.com/ > > > >>>> [2] > > > >>>> https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html > > > >>>> [3] > > > >>>> https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.pal...@oracle.com/ > > > >>>> > > > >>>> Eugenio Pérez (7): > > > >>>> vdpa: check for iova tree initialized at net_client_start > > > >>>> vdpa: reorder vhost_vdpa_set_backend_cap > > > >>>> vdpa: set backend capabilities at vhost_vdpa_init > > > >>>> vdpa: add listener_registered > > > >>>> vdpa: reorder listener assignment > > > >>>> vdpa: move iova_tree allocation to net_vhost_vdpa_init > > > >>>> vdpa: move memory listener register to vhost_vdpa_init > > > >>>> > > > >>>> hw/virtio/vhost-vdpa.c | 98 > > > >>>> ++++++++++++++++++++++------------ > > > >>>> include/hw/virtio/vhost-vdpa.h | 22 +++++++- > > > >>>> net/vhost-vdpa.c | 34 ++---------- > > > >>>> 3 files changed, 88 insertions(+), 66 deletions(-) > > > >>>> > > > >>>> -- > > > >>>> 2.43.5 > > > >>>> > > > >>>> > > > > > >