Hi,

On 2/10/25 7:53 PM, Eugenio Perez Martin wrote:
On Mon, Feb 10, 2025 at 11:58 AM Sahil Siddiq <icegambi...@gmail.com> wrote:
On 2/6/25 8:47 PM, Sahil Siddiq wrote:
On 2/6/25 12:42 PM, Eugenio Perez Martin wrote:
On Thu, Feb 6, 2025 at 6:26 AM Sahil Siddiq <icegambi...@gmail.com> wrote:
On 2/4/25 11:45 PM, Eugenio Perez Martin wrote:
PS: Please note that you can check packed_vq SVQ implementation
already without CVQ, as these features are totally orthogonal :).


Right. Now that I can ping with the ctrl features turned off, I think
this should take precedence. There's another issue specific to the
packed virtqueue case. It causes the kernel to crash. I have been
investigating this and the situation here looks very similar to what's
explained in Jason Wang's mail [2]. My plan of action is to apply his
changes in L2's kernel and check if that resolves the problem.

The details of the crash can be found in this mail [3].


If you're testing this series without changes, I think that is caused
by not implementing the packed version of vhost_svq_get_buf.

https://lists.nongnu.org/archive/html/qemu-devel/2024-12/msg01902.html


Oh, apologies, I think I had misunderstood your response in the linked mail.
Until now, I thought they were unrelated. In that case, I'll implement the
packed version of vhost_svq_get_buf. Hopefully that fixes it :).


I noticed one thing while testing some of the changes that I have made.
I haven't finished making the relevant changes to all the functions which
will have to handle split and packed vq differently. L2's kernel crashes
when I launch L0-QEMU with ctrl_vq=on,ctrl_rx=on.

Interesting, is a similar crash than this? (NULL ptr deference on
virtnet_set_features)?

https://issues.redhat.com/browse/RHEL-391
I am not able to access this bug report (even with a Red Hat account). It
says it may have been deleted or I don't have the permission to view it.

It's hard to tell if this is the same issue. I don't think it is the same
issue though since I don't see any such indication in the logs. The kernel
throws the following:

[   23.047503] virtio_net virtio1: output.0:id 0 is not a head!
[   49.173243] watchdog: BUG: soft lockup - CPU#1 stuck for 25s! 
[NetworkManager:782]
[   49.174167] Modules linked in: rfkill intel_rapl_msr intel_rapl_common 
intel_uncore_frequency_common intel_pmc_core intel_vsec pmt_telemetry pmt_class 
kvg
[   49.188258] CPU: 1 PID: 782 Comm: NetworkManager Not tainted 
6.8.7-200.fc39.x86_64 #1
[   49.193196] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   49.193196] RIP: 0010:virtqueue_get_buf+0x0/0x20

Maybe I was incorrect in stating that the kernel crashes. It's more like
the kernel is stuck in a loop (according to these blog posts on soft
lockup [1][2]).

In the above trace, RIP is in virtqueue_get_buf() [3]. This is what
calls virtqueue_get_buf_ctx_packed() [4] which throws the error.

What I don't understand is why vq->packed.desc_state[id].data [5] is
NULL when the control features are turned on, but doesn't seem to be
NULL when the control features are turned off.

However, when I start
L0-QEMU with ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_mac_addr=off, L2's
kernel boots successfully. Tracing L2-QEMU also confirms that the packed
feature is enabled. With all the ctrl features disabled, I think pinging
will also be possible once I finish implementing the packed versions of
the other functions.


Good!

There's another thing that I am confused about regarding the current
implementation (in the master branch).

In hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_vring_write_descs() [1],
svq->free_head saves the descriptor in the specified format using
"le16_to_cpu" (line 171).

Good catch, this should be le16_to_cpu actually. But code wise is the
same, so we have no visible error. Do you want to send a patch to fix
it?

Sorry, I am still a little confused here. Did you mean cpu_to_le16
by any chance? Based on what I have understood, if it is to be used
by the host machine, then it should be cpu_to_le16.

I can send a patch once this is clear, or can even integrate it in
this patch series since this patch series refactors that function
anyway.

On the other hand, the value of i is stored
in the native endianness using "cpu_to_le16" (line 168). If "i" is to be
stored in the native endianness (little endian in this case), then
should svq->free_head first be converted to little endian before being
assigned to "i" at the start of the function (line 142)?


This part is correct in the code, as it is used by the host, not
written to the guest or read from the guest. So no conversion is
needed.

Understood.

Thanks,
Sahil

[1] https://www.suse.com/support/kb/doc/?id=000018705
[2] https://softlockup.com/SystemAdministration/Linux/Kernel/softlockup/
[3] 
https://github.com/torvalds/linux/blob/master/drivers/virtio/virtio_ring.c#L2545
[4] 
https://github.com/torvalds/linux/blob/master/drivers/virtio/virtio_ring.c#L1727
[5] 
https://github.com/torvalds/linux/blob/master/drivers/virtio/virtio_ring.c#L1762


Reply via email to