Re: [RFC v4 0/5] Add packed virtqueue to shadow virtqueue

Sahil Siddiq Mon, 24 Mar 2025 06:55:56 -0700

Hi,

On 3/6/25 12:53 PM, Eugenio Perez Martin wrote:

On Thu, Mar 6, 2025 at 6:26 AM Sahil Siddiq <icegambi...@gmail.com> wrote:

[...]
On 2/11/25 1:27 PM, Eugenio Perez Martin wrote:

[...]

[   49.173243] watchdog: BUG: soft lockup - CPU#1 stuck for 25s! 
[NetworkManager:782]
[   49.174167] Modules linked in: rfkill intel_rapl_msr intel_rapl_common 
intel_uncore_frequency_common intel_pmc_core intel_vsec pmt_telemetry pmt_class 
kvg
[   49.188258] CPU: 1 PID: 782 Comm: NetworkManager Not tainted 
6.8.7-200.fc39.x86_64 #1
[   49.193196] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   49.193196] RIP: 0010:virtqueue_get_buf+0x0/0x20


Two possibilities about this part:
a) You're spending "too long" in the debugger in QEMU. From the kernel
POV, the function virtqueue_get_buf is taking too long to complete so
it detects it as a lockup. You can check this scenario by not running
QEMU under GDB or disabling all breakpoints. You can ignore this
message if you don't find the error this way. If you still see the
message, goto possibility b.

b) The kernel has a bug that makes it softlockup in virtqueue_get_buf.
The kernel should not soft lockup even if your changes were malicious
:(, so it is something to be fixed. If you have the time, can you test
with the latest upstream kernel?


I wasn't running QEMU under GDB, so there may indeed be an issue in the
kernel. While I don't see a soft lockup at this exact point after making
the above described changes, I do see a soft lockup issue in another part
of virtio-net.

When testing my implementation with the control feature bits turned on,
the kernel throws the following warning while booting.

[    9.046478] net eth0: Failed to disable allmulti mode.

This is printed as a dev_warn() in 
drivers/net/virtio_net.c:virtnet_rx_mode_work [1].
The kernel doesn't continue booting beyond this point and after a few seconds,
it reports a soft lockup.

Maybe I was incorrect in stating that the kernel crashes. It's more like
the kernel is stuck in a loop (according to these blog posts on soft
lockup [1][2]).

In the above trace, RIP is in virtqueue_get_buf() [3]. This is what
calls virtqueue_get_buf_ctx_packed() [4] which throws the error.

What I don't understand is why vq->packed.desc_state[id].data [5] is
NULL when the control features are turned on, but doesn't seem to be
NULL when the control features are turned off.


Due to the net subsystem lock, CVQ handling is not as robust / secure
against this error as the dataplane queues. There is an ongoing effort
to make it more robust, so maybe this is something to fix in that
line.

Can you put the whole backtrace that prints the kernel?


I haven't tested these changes with the latest kernel yet. I think this would be
a good time to test against the latest kernel. I'll update my kernel.

Here's the backtrace that is printed in the kernel that I currently have 
installed
(6.8.5-201.fc39.x86_64), in case this is relevant.

[   65.214308] watchdog: BUG: soft lockup - CPU#0 stuck for 51s! [NetworkManage]
[   65.215933] Modules linked in: rfkill intel_rapl_msr intel_rapl_common intelg
[   65.238465] CPU: 0 PID: 784 Comm: NetworkManager Tainted: G             L   1
[   65.242530] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14
[   65.248474] RIP: 0010:virtnet_send_command+0x17c/0x1e0 [virtio_net]
[   65.251505] Code: 74 24 48 e8 f6 b1 40 c1 85 c0 78 60 48 8b 7b 08 e8 29 92 43
[   65.260475] RSP: 0018:ffffb8038073f298 EFLAGS: 00000246
[   65.260475] RAX: 0000000000000000 RBX: ffff8ea600f389c0 RCX: ffffb8038073f29c
[   65.265165] RDX: 0000000000008003 RSI: 0000000000000000 RDI: ffff8ea60cead300
[   65.269528] RBP: ffffb8038073f2c0 R08: 0000000000000001 R09: ffff8ea600f389c0
[   65.272532] R10: 0000000000000030 R11: 0000000000000002 R12: 0000000000000002
[   65.274483] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   65.278518] FS:  00007f2dd66f2540(0000) GS:ffff8ea67bc00000(0000) knlGS:00000
[   65.280653] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   65.284492] CR2: 00005593d46e5868 CR3: 000000010d310001 CR4: 0000000000770ef0
[   65.286464] PKRU: 55555554
[   65.288524] Call Trace:
[   65.291470]  <IRQ>
[   65.291470]  ? watchdog_timer_fn+0x1e6/0x270
[   65.293464]  ? __pfx_watchdog_timer_fn+0x10/0x10
[   65.296496]  ? __hrtimer_run_queues+0x10f/0x2b0
[   65.297578]  ? hrtimer_interrupt+0xf8/0x230
[   65.300472]  ? __sysvec_apic_timer_interrupt+0x4d/0x140
[   65.301680]  ? sysvec_apic_timer_interrupt+0x6d/0x90
[   65.305464]  </IRQ>
[   65.305464]  <TASK>
[   65.305464]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[   65.308705]  ? virtnet_send_command+0x17c/0x1e0 [virtio_net]
[   65.312466]  ? virtnet_send_command+0x176/0x1e0 [virtio_net]
[   65.314465]  virtnet_set_rx_mode+0xd8/0x340 [virtio_net]
[   65.317466]  __dev_mc_add+0x79/0x80
[   65.318462]  igmp_group_added+0x1f2/0x210
[   65.320541]  ____ip_mc_inc_group+0x15b/0x250
[   65.323522]  ip_mc_up+0x4f/0xb0
[   65.324491]  inetdev_event+0x27a/0x700
[   65.325469]  ? _raw_spin_unlock_irqrestore+0xe/0x40
[   65.329462]  notifier_call_chain+0x5a/0xd0
[   65.331717]  __dev_notify_flags+0x5c/0xf0
[   65.332491]  dev_change_flags+0x54/0x70
[   65.334508]  do_setlink+0x375/0x12d0
[   65.336554]  ? __nla_validate_parse+0x61/0xd50
[   65.338510]  __rtnl_newlink+0x668/0xa30
[   65.340733]  ? security_unix_may_send+0x21/0x50
[   65.342620]  rtnl_newlink+0x47/0x70
[   65.344556]  rtnetlink_rcv_msg+0x14f/0x3c0
[   65.346509]  ? avc_has_perm_noaudit+0x6b/0xf0
[   65.348470]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[   65.350533]  netlink_rcv_skb+0x58/0x110
[   65.352482]  netlink_unicast+0x1a3/0x290
[   65.354547]  netlink_sendmsg+0x223/0x490
[   65.356480]  ____sys_sendmsg+0x396/0x3d0
[   65.357482]  ? copy_msghdr_from_user+0x7d/0xc0
[   65.360488]  ___sys_sendmsg+0x9a/0xe0
[   65.360488]  __sys_sendmsg+0x7a/0xd0
[   65.364591]  do_syscall_64+0x83/0x170
[   65.365485]  ? syscall_exit_to_user_mode+0x83/0x230
[   65.368475]  ? do_syscall_64+0x90/0x170
[   65.370477]  ? _raw_spin_unlock+0xe/0x30
[   65.372498]  ? proc_sys_call_handler+0xfc/0x2e0
[   65.374474]  ? kvm_clock_get_cycles+0x18/0x30
[   65.376475]  ? ktime_get_ts64+0x47/0xe0
[   65.378457]  ? posix_get_monotonic_timespec+0x65/0xa0
[   65.380535]  ? put_timespec64+0x3e/0x70
[   65.382458]  ? syscall_exit_to_user_mode+0x83/0x230
[   65.384542]  ? do_syscall_64+0x90/0x170
[   65.384542]  ? do_syscall_64+0x90/0x170
[   65.387505]  ? ksys_write+0xd8/0xf0
[   65.388670]  ? syscall_exit_to_user_mode+0x83/0x230
[   65.390522]  ? do_syscall_64+0x90/0x170
[   65.390522]  ? syscall_exit_to_user_mode+0x83/0x230
[   65.394472]  ? do_syscall_64+0x90/0x170
[   65.396532]  ? syscall_exit_to_user_mode+0x83/0x230
[   65.398519]  ? do_syscall_64+0x90/0x170
[   65.400486]  ? do_user_addr_fault+0x304/0x670
[   65.400486]  ? clear_bhb_loop+0x55/0xb0
[   65.404531]  ? clear_bhb_loop+0x55/0xb0
[   65.405471]  ? clear_bhb_loop+0x55/0xb0
[   65.408520]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[   65.408520] RIP: 0033:0x7f2dd7810a1b
[   65.413467] Code: 48 89 e5 48 83 ec 20 89 55 ec 48 89 75 f0 89 7d f8 e8 09 3b
[   65.420593] RSP: 002b:00007ffcff6bd520 EFLAGS: 00000293 ORIG_RAX: 0000000000e
[   65.425554] RAX: ffffffffffffffda RBX: 00005593d4679a90 RCX: 00007f2dd7810a1b
[   65.428519] RDX: 0000000000000000 RSI: 00007ffcff6bd560 RDI: 000000000000000d
[   65.430509] RBP: 00007ffcff6bd540 R08: 0000000000000000 R09: 0000000000000000
[   65.434723] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000000a
[   65.438526] R13: 00005593d4679a90 R14: 0000000000000001 R15: 0000000000000000
[   65.440555]  </TASK>
[   71.028432] rcu: INFO: rcu_preempt self-detected stall on CPU
[   71.028432] rcu:     0-....: (1 GPs behind) idle=7764/1/0x4000000000000000 s9
[   71.036518] rcu:     (t=60010 jiffies g=2193 q=1947 ncpus=4)
[   71.041707] CPU: 0 PID: 784 Comm: NetworkManager Tainted: G             L   1
[   71.050455] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14
[   71.055661] RIP: 0010:virtnet_send_command+0x17c/0x1e0 [virtio_net]
[   71.059518] Code: 74 24 48 e8 f6 b1 40 c1 85 c0 78 60 48 8b 7b 08 e8 29 92 43
[   71.065526] RSP: 0018:ffffb8038073f298 EFLAGS: 00000246
[   71.067651] RAX: 0000000000000000 RBX: ffff8ea600f389c0 RCX: ffffb8038073f29c
[   71.069472] RDX: 0000000000008003 RSI: 0000000000000000 RDI: ffff8ea60cead300
[   71.071461] RBP: ffffb8038073f2c0 R08: 0000000000000001 R09: ffff8ea600f389c0
[   71.075455] R10: 0000000000000030 R11: 0000000000000002 R12: 0000000000000002
[   71.078461] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   71.079579] FS:  00007f2dd66f2540(0000) GS:ffff8ea67bc00000(0000) knlGS:00000
[   71.083577] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   71.083577] CR2: 00005593d46e5868 CR3: 000000010d310001 CR4: 0000000000770ef0
[   71.087582] PKRU: 55555554
[   71.090472] Call Trace:
[   71.091452]  <IRQ>
[   71.091452]  ? rcu_dump_cpu_stacks+0xc4/0x100
[   71.095487]  ? rcu_sched_clock_irq+0x32e/0x1040
[   71.095487]  ? task_tick_fair+0x40/0x3f0
[   71.100466]  ? trigger_load_balance+0x73/0x360
[   71.100466]  ? update_process_times+0x74/0xb0
[   71.103539]  ? tick_sched_handle+0x21/0x60
[   71.107494]  ? tick_nohz_highres_handler+0x6f/0x90
[   71.107572]  ? __pfx_tick_nohz_highres_handler+0x10/0x10
[   71.111477]  ? __hrtimer_run_queues+0x10f/0x2b0
[   71.111477]  ? hrtimer_interrupt+0xf8/0x230
[   71.116489]  ? __sysvec_apic_timer_interrupt+0x4d/0x140
[   71.119526]  ? sysvec_apic_timer_interrupt+0x6d/0x90
[   71.119526]  </IRQ>
[   71.124489]  <TASK>
[   71.124489]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[   71.127499]  ? virtnet_send_command+0x17c/0x1e0 [virtio_net]
[   71.127499]  ? virtnet_send_command+0x176/0x1e0 [virtio_net]
[   71.132613]  virtnet_set_rx_mode+0xd8/0x340 [virtio_net]
[   71.136474]  __dev_mc_add+0x79/0x80
[   71.136474]  igmp_group_added+0x1f2/0x210
[   71.139469]  ____ip_mc_inc_group+0x15b/0x250
[   71.140473]  ip_mc_up+0x4f/0xb0
[   71.143492]  inetdev_event+0x27a/0x700
[   71.144486]  ? _raw_spin_unlock_irqrestore+0xe/0x40
[   71.147600]  notifier_call_chain+0x5a/0xd0
[   71.148918]  __dev_notify_flags+0x5c/0xf0
[   71.151634]  dev_change_flags+0x54/0x70
[   71.153529]  do_setlink+0x375/0x12d0
[   71.155476]  ? __nla_validate_parse+0x61/0xd50
[   71.157541]  __rtnl_newlink+0x668/0xa30
[   71.159503]  ? security_unix_may_send+0x21/0x50
[   71.161954]  rtnl_newlink+0x47/0x70
[   71.163680]  rtnetlink_rcv_msg+0x14f/0x3c0
[   71.165468]  ? avc_has_perm_noaudit+0x6b/0xf0
[   71.167506]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[   71.170499]  netlink_rcv_skb+0x58/0x110
[   71.171461]  netlink_unicast+0x1a3/0x290
[   71.174477]  netlink_sendmsg+0x223/0x490
[   71.175472]  ____sys_sendmsg+0x396/0x3d0
[   71.175472]  ? copy_msghdr_from_user+0x7d/0xc0
[   71.179465]  ___sys_sendmsg+0x9a/0xe0
[   71.179465]  __sys_sendmsg+0x7a/0xd0
[   71.182526]  do_syscall_64+0x83/0x170
[   71.183522]  ? syscall_exit_to_user_mode+0x83/0x230
[   71.183522]  ? do_syscall_64+0x90/0x170
[   71.187502]  ? _raw_spin_unlock+0xe/0x30
[   71.187502]  ? proc_sys_call_handler+0xfc/0x2e0
[   71.191500]  ? kvm_clock_get_cycles+0x18/0x30
[   71.191500]  ? ktime_get_ts64+0x47/0xe0
[   71.195472]  ? posix_get_monotonic_timespec+0x65/0xa0
[   71.195472]  ? put_timespec64+0x3e/0x70
[   71.198593]  ? syscall_exit_to_user_mode+0x83/0x230
[   71.199571]  ? do_syscall_64+0x90/0x170
[   71.202457]  ? do_syscall_64+0x90/0x170
[   71.203463]  ? ksys_write+0xd8/0xf0
[   71.203463]  ? syscall_exit_to_user_mode+0x83/0x230
[   71.207464]  ? do_syscall_64+0x90/0x170
[   71.207464]  ? syscall_exit_to_user_mode+0x83/0x230
[   71.211460]  ? do_syscall_64+0x90/0x170
[   71.211460]  ? syscall_exit_to_user_mode+0x83/0x230
[   71.211460]  ? do_syscall_64+0x90/0x170
[   71.216481]  ? do_user_addr_fault+0x304/0x670
[   71.216481]  ? clear_bhb_loop+0x55/0xb0
[   71.220472]  ? clear_bhb_loop+0x55/0xb0
[   71.220472]  ? clear_bhb_loop+0x55/0xb0
[   71.223704]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[   71.225564] RIP: 0033:0x7f2dd7810a1b
[   71.227495] Code: 48 89 e5 48 83 ec 20 89 55 ec 48 89 75 f0 89 7d f8 e8 09 3b
[   71.235475] RSP: 002b:00007ffcff6bd520 EFLAGS: 00000293 ORIG_RAX: 0000000000e
[   71.239515] RAX: ffffffffffffffda RBX: 00005593d4679a90 RCX: 00007f2dd7810a1b
[   71.241643] RDX: 0000000000000000 RSI: 00007ffcff6bd560 RDI: 000000000000000d
[   71.245469] RBP: 00007ffcff6bd540 R08: 0000000000000000 R09: 0000000000000000
[   71.247467] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000000a
[   71.251479] R13: 00005593d4679a90 R14: 0000000000000001 R15: 0000000000000000
[   71.251479]  </TASK>


Yes, the kernel does softlock waiting for a reply if the CVQ does not
move forward. This is a known issue that is being fixed, but it is not
easy :). To achieve the packed vq support, we can either disable CVQ
entirely or try to process the message the kernel is trying to send.
Both approaches come down to the same functions in SVQ, so you can
pick the one you feel more comfortable :).


I would like to start with disabling the CVQ altogether. The data plane
implementation is still giving a problem as described below. Once that
is resolved, it should be easier to handle the CVQ.

When I test my changes with the control feature bits turned off, I see another
issue. The kernel boots successfully in this case, but I noticed that no new
elements in the dataplane are added to the virtqueue. This is because, in
hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_translate_addr() [2], when gpas
is not null and QEMU tries to retrieve the IOVA address from the GPA->IOVA
tree, the result of map is NULL in the following line [3]:

map = vhost_iova_tree_find_gpa(svq->iova_tree, &needle)

Due to this, vhost_svq_vring_write_descs() [4] simply returns false and nothing
is added to the virtqueue.

This issue is present even for split virtqueues, when I test my changes with
"packed=off". However, I don't see any issues when I build QEMU from the master
branch. I think the issue might lie in how memory is being allocated to the
virtqueues in my implementation, but I am not sure. I have a few ideas regarding
how this can be debugged. I'll let you know if I find anything else.


Understood! In case you run out of ideas, it seems like a good
candidate for bisection.

Thanks for the update!


I managed to make some progress. I am no longer having problems with this.
I am facing a new issue though and this issue does not arise everytime.

When testing the current state of my changes, I am able to ping L0 from L2
and vice versa via packed vqs. Unfortunately, this does not work everytime.

I thought I would send a new patch series for review in case I have missed
something.

I have explained the issue in greater detail in the cover letter.

Thanks,
Sahil

Re: [RFC v4 0/5] Add packed virtqueue to shadow virtqueue

Reply via email to