Hi,

Sorry for the delay in my response. There was a lot to absorb in the
previous mail and I thought I would spend some more time exploring
this.

On 2/11/25 1:27 PM, Eugenio Perez Martin wrote:
On Mon, Feb 10, 2025 at 5:25 PM Sahil Siddiq <icegambi...@gmail.com> wrote:
On 2/10/25 7:53 PM, Eugenio Perez Martin wrote:
On Mon, Feb 10, 2025 at 11:58 AM Sahil Siddiq <icegambi...@gmail.com> wrote:
On 2/6/25 8:47 PM, Sahil Siddiq wrote:
On 2/6/25 12:42 PM, Eugenio Perez Martin wrote:
On Thu, Feb 6, 2025 at 6:26 AM Sahil Siddiq <icegambi...@gmail.com> wrote:
On 2/4/25 11:45 PM, Eugenio Perez Martin wrote:
PS: Please note that you can check packed_vq SVQ implementation
already without CVQ, as these features are totally orthogonal :).


Right. Now that I can ping with the ctrl features turned off, I think
this should take precedence. There's another issue specific to the
packed virtqueue case. It causes the kernel to crash. I have been
investigating this and the situation here looks very similar to what's
explained in Jason Wang's mail [2]. My plan of action is to apply his
changes in L2's kernel and check if that resolves the problem.

The details of the crash can be found in this mail [3].


If you're testing this series without changes, I think that is caused
by not implementing the packed version of vhost_svq_get_buf.

https://lists.nongnu.org/archive/html/qemu-devel/2024-12/msg01902.html


Oh, apologies, I think I had misunderstood your response in the linked mail.
Until now, I thought they were unrelated. In that case, I'll implement the
packed version of vhost_svq_get_buf. Hopefully that fixes it :).


I noticed one thing while testing some of the changes that I have made.
I haven't finished making the relevant changes to all the functions which
will have to handle split and packed vq differently. L2's kernel crashes
when I launch L0-QEMU with ctrl_vq=on,ctrl_rx=on.

Interesting, is a similar crash than this? (NULL ptr deference on
virtnet_set_features)?

https://issues.redhat.com/browse/RHEL-391
I am not able to access this bug report (even with a Red Hat account). It
says it may have been deleted or I don't have the permission to view it.

It's hard to tell if this is the same issue. I don't think it is the same
issue though since I don't see any such indication in the logs. The kernel
throws the following:

[   23.047503] virtio_net virtio1: output.0:id 0 is not a head!

This is a common error when modifying code of the dataplane, it is
unlikely to do deep changes and not see this error :). It indicates
that your code is marking the descriptor id 0 as used when the guest
didn't make it available.

Right, I explored this a little further. I noticed that there were
a few issues in my implementation with the way packed vqs were being
handled (apart from the lack of implementation of
vhost_svq_get_buf_packed). After making the relevant changes and
implementing vhost_svq_get_buf_packed, I couldn't see this issue
anymore.

If this is happening in control virtqueue, I'd check if the code is
setting the flags as used in ring[1] when it shouldn't. But my bet is
that the rx queue is the wrong one.

The flags were one of the issues. I hadn't initialized "avail_used_flags"
correctly. Rectifying them seems to have solved this issue. However, I see
two new issues (described further below).

[   49.173243] watchdog: BUG: soft lockup - CPU#1 stuck for 25s! 
[NetworkManager:782]
[   49.174167] Modules linked in: rfkill intel_rapl_msr intel_rapl_common 
intel_uncore_frequency_common intel_pmc_core intel_vsec pmt_telemetry pmt_class 
kvg
[   49.188258] CPU: 1 PID: 782 Comm: NetworkManager Not tainted 
6.8.7-200.fc39.x86_64 #1
[   49.193196] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   49.193196] RIP: 0010:virtqueue_get_buf+0x0/0x20


Two possibilities about this part:
a) You're spending "too long" in the debugger in QEMU. From the kernel
POV, the function virtqueue_get_buf is taking too long to complete so
it detects it as a lockup. You can check this scenario by not running
QEMU under GDB or disabling all breakpoints. You can ignore this
message if you don't find the error this way. If you still see the
message, goto possibility b.

b) The kernel has a bug that makes it softlockup in virtqueue_get_buf.
The kernel should not soft lockup even if your changes were malicious
:(, so it is something to be fixed. If you have the time, can you test
with the latest upstream kernel?

I wasn't running QEMU under GDB, so there may indeed be an issue in the
kernel. While I don't see a soft lockup at this exact point after making
the above described changes, I do see a soft lockup issue in another part
of virtio-net.

When testing my implementation with the control feature bits turned on,
the kernel throws the following warning while booting.

[    9.046478] net eth0: Failed to disable allmulti mode.

This is printed as a dev_warn() in 
drivers/net/virtio_net.c:virtnet_rx_mode_work [1].
The kernel doesn't continue booting beyond this point and after a few seconds,
it reports a soft lockup.

Maybe I was incorrect in stating that the kernel crashes. It's more like
the kernel is stuck in a loop (according to these blog posts on soft
lockup [1][2]).

In the above trace, RIP is in virtqueue_get_buf() [3]. This is what
calls virtqueue_get_buf_ctx_packed() [4] which throws the error.

What I don't understand is why vq->packed.desc_state[id].data [5] is
NULL when the control features are turned on, but doesn't seem to be
NULL when the control features are turned off.

Due to the net subsystem lock, CVQ handling is not as robust / secure
against this error as the dataplane queues. There is an ongoing effort
to make it more robust, so maybe this is something to fix in that
line.

Can you put the whole backtrace that prints the kernel?

I haven't tested these changes with the latest kernel yet. I think this would be
a good time to test against the latest kernel. I'll update my kernel.

Here's the backtrace that is printed in the kernel that I currently have 
installed
(6.8.5-201.fc39.x86_64), in case this is relevant.

[   65.214308] watchdog: BUG: soft lockup - CPU#0 stuck for 51s! [NetworkManage]
[   65.215933] Modules linked in: rfkill intel_rapl_msr intel_rapl_common intelg
[   65.238465] CPU: 0 PID: 784 Comm: NetworkManager Tainted: G             L   1
[   65.242530] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14
[   65.248474] RIP: 0010:virtnet_send_command+0x17c/0x1e0 [virtio_net]
[   65.251505] Code: 74 24 48 e8 f6 b1 40 c1 85 c0 78 60 48 8b 7b 08 e8 29 92 43
[   65.260475] RSP: 0018:ffffb8038073f298 EFLAGS: 00000246
[   65.260475] RAX: 0000000000000000 RBX: ffff8ea600f389c0 RCX: ffffb8038073f29c
[   65.265165] RDX: 0000000000008003 RSI: 0000000000000000 RDI: ffff8ea60cead300
[   65.269528] RBP: ffffb8038073f2c0 R08: 0000000000000001 R09: ffff8ea600f389c0
[   65.272532] R10: 0000000000000030 R11: 0000000000000002 R12: 0000000000000002
[   65.274483] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   65.278518] FS:  00007f2dd66f2540(0000) GS:ffff8ea67bc00000(0000) knlGS:00000
[   65.280653] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   65.284492] CR2: 00005593d46e5868 CR3: 000000010d310001 CR4: 0000000000770ef0
[   65.286464] PKRU: 55555554
[   65.288524] Call Trace:
[   65.291470]  <IRQ>
[   65.291470]  ? watchdog_timer_fn+0x1e6/0x270
[   65.293464]  ? __pfx_watchdog_timer_fn+0x10/0x10
[   65.296496]  ? __hrtimer_run_queues+0x10f/0x2b0
[   65.297578]  ? hrtimer_interrupt+0xf8/0x230
[   65.300472]  ? __sysvec_apic_timer_interrupt+0x4d/0x140
[   65.301680]  ? sysvec_apic_timer_interrupt+0x6d/0x90
[   65.305464]  </IRQ>
[   65.305464]  <TASK>
[   65.305464]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[   65.308705]  ? virtnet_send_command+0x17c/0x1e0 [virtio_net]
[   65.312466]  ? virtnet_send_command+0x176/0x1e0 [virtio_net]
[   65.314465]  virtnet_set_rx_mode+0xd8/0x340 [virtio_net]
[   65.317466]  __dev_mc_add+0x79/0x80
[   65.318462]  igmp_group_added+0x1f2/0x210
[   65.320541]  ____ip_mc_inc_group+0x15b/0x250
[   65.323522]  ip_mc_up+0x4f/0xb0
[   65.324491]  inetdev_event+0x27a/0x700
[   65.325469]  ? _raw_spin_unlock_irqrestore+0xe/0x40
[   65.329462]  notifier_call_chain+0x5a/0xd0
[   65.331717]  __dev_notify_flags+0x5c/0xf0
[   65.332491]  dev_change_flags+0x54/0x70
[   65.334508]  do_setlink+0x375/0x12d0
[   65.336554]  ? __nla_validate_parse+0x61/0xd50
[   65.338510]  __rtnl_newlink+0x668/0xa30
[   65.340733]  ? security_unix_may_send+0x21/0x50
[   65.342620]  rtnl_newlink+0x47/0x70
[   65.344556]  rtnetlink_rcv_msg+0x14f/0x3c0
[   65.346509]  ? avc_has_perm_noaudit+0x6b/0xf0
[   65.348470]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[   65.350533]  netlink_rcv_skb+0x58/0x110
[   65.352482]  netlink_unicast+0x1a3/0x290
[   65.354547]  netlink_sendmsg+0x223/0x490
[   65.356480]  ____sys_sendmsg+0x396/0x3d0
[   65.357482]  ? copy_msghdr_from_user+0x7d/0xc0
[   65.360488]  ___sys_sendmsg+0x9a/0xe0
[   65.360488]  __sys_sendmsg+0x7a/0xd0
[   65.364591]  do_syscall_64+0x83/0x170
[   65.365485]  ? syscall_exit_to_user_mode+0x83/0x230
[   65.368475]  ? do_syscall_64+0x90/0x170
[   65.370477]  ? _raw_spin_unlock+0xe/0x30
[   65.372498]  ? proc_sys_call_handler+0xfc/0x2e0
[   65.374474]  ? kvm_clock_get_cycles+0x18/0x30
[   65.376475]  ? ktime_get_ts64+0x47/0xe0
[   65.378457]  ? posix_get_monotonic_timespec+0x65/0xa0
[   65.380535]  ? put_timespec64+0x3e/0x70
[   65.382458]  ? syscall_exit_to_user_mode+0x83/0x230
[   65.384542]  ? do_syscall_64+0x90/0x170
[   65.384542]  ? do_syscall_64+0x90/0x170
[   65.387505]  ? ksys_write+0xd8/0xf0
[   65.388670]  ? syscall_exit_to_user_mode+0x83/0x230
[   65.390522]  ? do_syscall_64+0x90/0x170
[   65.390522]  ? syscall_exit_to_user_mode+0x83/0x230
[   65.394472]  ? do_syscall_64+0x90/0x170
[   65.396532]  ? syscall_exit_to_user_mode+0x83/0x230
[   65.398519]  ? do_syscall_64+0x90/0x170
[   65.400486]  ? do_user_addr_fault+0x304/0x670
[   65.400486]  ? clear_bhb_loop+0x55/0xb0
[   65.404531]  ? clear_bhb_loop+0x55/0xb0
[   65.405471]  ? clear_bhb_loop+0x55/0xb0
[   65.408520]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[   65.408520] RIP: 0033:0x7f2dd7810a1b
[   65.413467] Code: 48 89 e5 48 83 ec 20 89 55 ec 48 89 75 f0 89 7d f8 e8 09 3b
[   65.420593] RSP: 002b:00007ffcff6bd520 EFLAGS: 00000293 ORIG_RAX: 0000000000e
[   65.425554] RAX: ffffffffffffffda RBX: 00005593d4679a90 RCX: 00007f2dd7810a1b
[   65.428519] RDX: 0000000000000000 RSI: 00007ffcff6bd560 RDI: 000000000000000d
[   65.430509] RBP: 00007ffcff6bd540 R08: 0000000000000000 R09: 0000000000000000
[   65.434723] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000000a
[   65.438526] R13: 00005593d4679a90 R14: 0000000000000001 R15: 0000000000000000
[   65.440555]  </TASK>
[   71.028432] rcu: INFO: rcu_preempt self-detected stall on CPU
[   71.028432] rcu:     0-....: (1 GPs behind) idle=7764/1/0x4000000000000000 s9
[   71.036518] rcu:     (t=60010 jiffies g=2193 q=1947 ncpus=4)
[   71.041707] CPU: 0 PID: 784 Comm: NetworkManager Tainted: G             L   1
[   71.050455] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14
[   71.055661] RIP: 0010:virtnet_send_command+0x17c/0x1e0 [virtio_net]
[   71.059518] Code: 74 24 48 e8 f6 b1 40 c1 85 c0 78 60 48 8b 7b 08 e8 29 92 43
[   71.065526] RSP: 0018:ffffb8038073f298 EFLAGS: 00000246
[   71.067651] RAX: 0000000000000000 RBX: ffff8ea600f389c0 RCX: ffffb8038073f29c
[   71.069472] RDX: 0000000000008003 RSI: 0000000000000000 RDI: ffff8ea60cead300
[   71.071461] RBP: ffffb8038073f2c0 R08: 0000000000000001 R09: ffff8ea600f389c0
[   71.075455] R10: 0000000000000030 R11: 0000000000000002 R12: 0000000000000002
[   71.078461] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   71.079579] FS:  00007f2dd66f2540(0000) GS:ffff8ea67bc00000(0000) knlGS:00000
[   71.083577] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   71.083577] CR2: 00005593d46e5868 CR3: 000000010d310001 CR4: 0000000000770ef0
[   71.087582] PKRU: 55555554
[   71.090472] Call Trace:
[   71.091452]  <IRQ>
[   71.091452]  ? rcu_dump_cpu_stacks+0xc4/0x100
[   71.095487]  ? rcu_sched_clock_irq+0x32e/0x1040
[   71.095487]  ? task_tick_fair+0x40/0x3f0
[   71.100466]  ? trigger_load_balance+0x73/0x360
[   71.100466]  ? update_process_times+0x74/0xb0
[   71.103539]  ? tick_sched_handle+0x21/0x60
[   71.107494]  ? tick_nohz_highres_handler+0x6f/0x90
[   71.107572]  ? __pfx_tick_nohz_highres_handler+0x10/0x10
[   71.111477]  ? __hrtimer_run_queues+0x10f/0x2b0
[   71.111477]  ? hrtimer_interrupt+0xf8/0x230
[   71.116489]  ? __sysvec_apic_timer_interrupt+0x4d/0x140
[   71.119526]  ? sysvec_apic_timer_interrupt+0x6d/0x90
[   71.119526]  </IRQ>
[   71.124489]  <TASK>
[   71.124489]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[   71.127499]  ? virtnet_send_command+0x17c/0x1e0 [virtio_net]
[   71.127499]  ? virtnet_send_command+0x176/0x1e0 [virtio_net]
[   71.132613]  virtnet_set_rx_mode+0xd8/0x340 [virtio_net]
[   71.136474]  __dev_mc_add+0x79/0x80
[   71.136474]  igmp_group_added+0x1f2/0x210
[   71.139469]  ____ip_mc_inc_group+0x15b/0x250
[   71.140473]  ip_mc_up+0x4f/0xb0
[   71.143492]  inetdev_event+0x27a/0x700
[   71.144486]  ? _raw_spin_unlock_irqrestore+0xe/0x40
[   71.147600]  notifier_call_chain+0x5a/0xd0
[   71.148918]  __dev_notify_flags+0x5c/0xf0
[   71.151634]  dev_change_flags+0x54/0x70
[   71.153529]  do_setlink+0x375/0x12d0
[   71.155476]  ? __nla_validate_parse+0x61/0xd50
[   71.157541]  __rtnl_newlink+0x668/0xa30
[   71.159503]  ? security_unix_may_send+0x21/0x50
[   71.161954]  rtnl_newlink+0x47/0x70
[   71.163680]  rtnetlink_rcv_msg+0x14f/0x3c0
[   71.165468]  ? avc_has_perm_noaudit+0x6b/0xf0
[   71.167506]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[   71.170499]  netlink_rcv_skb+0x58/0x110
[   71.171461]  netlink_unicast+0x1a3/0x290
[   71.174477]  netlink_sendmsg+0x223/0x490
[   71.175472]  ____sys_sendmsg+0x396/0x3d0
[   71.175472]  ? copy_msghdr_from_user+0x7d/0xc0
[   71.179465]  ___sys_sendmsg+0x9a/0xe0
[   71.179465]  __sys_sendmsg+0x7a/0xd0
[   71.182526]  do_syscall_64+0x83/0x170
[   71.183522]  ? syscall_exit_to_user_mode+0x83/0x230
[   71.183522]  ? do_syscall_64+0x90/0x170
[   71.187502]  ? _raw_spin_unlock+0xe/0x30
[   71.187502]  ? proc_sys_call_handler+0xfc/0x2e0
[   71.191500]  ? kvm_clock_get_cycles+0x18/0x30
[   71.191500]  ? ktime_get_ts64+0x47/0xe0
[   71.195472]  ? posix_get_monotonic_timespec+0x65/0xa0
[   71.195472]  ? put_timespec64+0x3e/0x70
[   71.198593]  ? syscall_exit_to_user_mode+0x83/0x230
[   71.199571]  ? do_syscall_64+0x90/0x170
[   71.202457]  ? do_syscall_64+0x90/0x170
[   71.203463]  ? ksys_write+0xd8/0xf0
[   71.203463]  ? syscall_exit_to_user_mode+0x83/0x230
[   71.207464]  ? do_syscall_64+0x90/0x170
[   71.207464]  ? syscall_exit_to_user_mode+0x83/0x230
[   71.211460]  ? do_syscall_64+0x90/0x170
[   71.211460]  ? syscall_exit_to_user_mode+0x83/0x230
[   71.211460]  ? do_syscall_64+0x90/0x170
[   71.216481]  ? do_user_addr_fault+0x304/0x670
[   71.216481]  ? clear_bhb_loop+0x55/0xb0
[   71.220472]  ? clear_bhb_loop+0x55/0xb0
[   71.220472]  ? clear_bhb_loop+0x55/0xb0
[   71.223704]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[   71.225564] RIP: 0033:0x7f2dd7810a1b
[   71.227495] Code: 48 89 e5 48 83 ec 20 89 55 ec 48 89 75 f0 89 7d f8 e8 09 3b
[   71.235475] RSP: 002b:00007ffcff6bd520 EFLAGS: 00000293 ORIG_RAX: 0000000000e
[   71.239515] RAX: ffffffffffffffda RBX: 00005593d4679a90 RCX: 00007f2dd7810a1b
[   71.241643] RDX: 0000000000000000 RSI: 00007ffcff6bd560 RDI: 000000000000000d
[   71.245469] RBP: 00007ffcff6bd540 R08: 0000000000000000 R09: 0000000000000000
[   71.247467] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000000a
[   71.251479] R13: 00005593d4679a90 R14: 0000000000000001 R15: 0000000000000000
[   71.251479]  </TASK>

When I test my changes with the control feature bits turned off, I see another
issue. The kernel boots successfully in this case, but I noticed that no new
elements in the dataplane are added to the virtqueue. This is because, in
hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_translate_addr() [2], when gpas
is not null and QEMU tries to retrieve the IOVA address from the GPA->IOVA
tree, the result of map is NULL in the following line [3]:

map = vhost_iova_tree_find_gpa(svq->iova_tree, &needle)

Due to this, vhost_svq_vring_write_descs() [4] simply returns false and nothing
is added to the virtqueue.

This issue is present even for split virtqueues, when I test my changes with
"packed=off". However, I don't see any issues when I build QEMU from the master
branch. I think the issue might lie in how memory is being allocated to the
virtqueues in my implementation, but I am not sure. I have a few ideas regarding
how this can be debugged. I'll let you know if I find anything else.

[...]
There's another thing that I am confused about regarding the current
implementation (in the master branch).

In hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_vring_write_descs() [1],
svq->free_head saves the descriptor in the specified format using
"le16_to_cpu" (line 171).

Good catch, this should be le16_to_cpu actually. But code wise is the
same, so we have no visible error. Do you want to send a patch to fix
it?

Sorry, I am still a little confused here. Did you mean cpu_to_le16
by any chance? Based on what I have understood, if it is to be used
by the host machine, then it should be cpu_to_le16.

I can send a patch once this is clear, or can even integrate it in
this patch series since this patch series refactors that function
anyway.


Ok, I don't know how I read the function to answer you that :(. Let me
start from scratch,

In line 171, we're copying data from QEMU internals, that are not in
the guest memory, to other QEMU internals. So no cpu_to_le* or
le*to_cpu is needed.

Understood.

On the other hand, the value of i is stored
in the native endianness using "cpu_to_le16" (line 168). If "i" is to be
stored in the native endianness (little endian in this case), then
should svq->free_head first be converted to little endian before being
assigned to "i" at the start of the function (line 142)?



No endianness conversion is needed here for the same reason, all is
internal to QEMU and not intended to be seen by the guest.


Got it. This makes sense now.

Thanks,
Sahil

[1] https://github.com/torvalds/linux/blob/master/drivers/net/virtio_net.c#L3712
[2] 
https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/vhost-shadow-virtqueue.c#L83
[3] 
https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/vhost-shadow-virtqueue.c#L104
[4] 
https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/vhost-shadow-virtqueue.c#L171


Reply via email to