Hi,

On 2/4/25 11:40 PM, Eugenio Perez Martin wrote:
On Tue, Feb 4, 2025 at 1:49 PM Sahil Siddiq <icegambi...@gmail.com> wrote:
On 1/31/25 12:27 PM, Eugenio Perez Martin wrote:
On Fri, Jan 31, 2025 at 6:04 AM Sahil Siddiq <icegambi...@gmail.com> wrote:
On 1/24/25 1:04 PM, Eugenio Perez Martin wrote:
On Fri, Jan 24, 2025 at 6:47 AM Sahil Siddiq <icegambi...@gmail.com> wrote:
On 1/21/25 10:07 PM, Eugenio Perez Martin wrote:
On Sun, Jan 19, 2025 at 7:37 AM Sahil Siddiq <icegambi...@gmail.com> wrote:
On 1/7/25 1:35 PM, Eugenio Perez Martin wrote:
[...]
Apologies for the delay in replying. It took me a while to figure
this out, but I have now understood why this doesn't work. L1 is
unable to receive messages from L0 because they get filtered out
by hw/net/virtio-net.c:receive_filter [1]. There's an issue with
the MAC addresses.

In L0, I have:

$ ip a show tap0
6: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state 
UNKNOWN group default qlen 1000
         link/ether d2:6d:b9:61:e1:9a brd ff:ff:ff:ff:ff:ff
         inet 111.1.1.1/24 scope global tap0
            valid_lft forever preferred_lft forever
         inet6 fe80::d06d:b9ff:fe61:e19a/64 scope link proto kernel_ll
            valid_lft forever preferred_lft forever

In L1:

# ip a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP 
group default qlen 1000
         link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
         altname enp0s2
         inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute 
eth0
            valid_lft 83455sec preferred_lft 83455sec
         inet6 fec0::7bd2:265e:3b8e:5acc/64 scope site dynamic noprefixroute
            valid_lft 86064sec preferred_lft 14064sec
         inet6 fe80::50e7:5bf6:fff8:a7b0/64 scope link noprefixroute
            valid_lft forever preferred_lft forever

I'll call this L1-eth0.

In L2:
# ip a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP gro0
         link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff
         altname enp0s7
         inet 111.1.1.2/24 scope global eth0
            valid_lft forever preferred_lft forever

I'll call this L2-eth0.

Apart from eth0, lo is the only other device in both L1 and L2.

A frame that L1 receives from L0 has L2-eth0's MAC address (LSB = 57)
as its destination address. When booting L2 with x-svq=false, the
value of n->mac in VirtIONet is also L2-eth0. So, L1 accepts
the frames and passes them on to L2 and pinging works [2].


So this behavior is interesting by itself. But L1's kernel net system
should not receive anything. As I read it, even if it receives it, it
should not forward the frame to L2 as it is in a different subnet. Are
you able to read it using tcpdump on L1?

I ran "tcpdump -i eth0" in L1. It didn't capture any of the packets
that were directed at L2 even though L2 was able to receive them.
Similarly, it didn't capture any packets that were sent from L2 to
L0. This is when L2 is launched with x-svq=false.
[...]
With x-svq=true, forcibly setting the LSB of n->mac to 0x57 in
receive_filter allows L2 to receive packets from L0. I added
the following line just before line 1771 [1] to check this out.

n->mac[5] = 0x57;


That's very interesting. Let me answer all the gdb questions below and
we can debug it deeper :).


Thank you for the primer on using gdb with QEMU. I am able to debug
QEMU now.

Maybe we can make the scenario clearer by telling which virtio-net
device is which with virtio_net_pci,mac=XX:... ?

However, when booting L2 with x-svq=true, n->mac is set to L1-eth0
(LSB = 56) in virtio_net_handle_mac() [3].

Can you tell with gdb bt if this function is called from net or the
SVQ subsystem?


It looks like the function is being called from net.

(gdb) bt
#0  virtio_net_handle_mac (n=0x15622425e, cmd=85 'U', iov=0x555558865980, 
iov_cnt=1476792840) at ../hw/net/virtio-net.c:1098
#1  0x0000555555e5920b in virtio_net_handle_ctrl_iov (vdev=0x555558fdacd0, 
in_sg=0x5555580611f8, in_num=1, out_sg=0x555558061208,
        out_num=1) at ../hw/net/virtio-net.c:1581
#2  0x0000555555e593a0 in virtio_net_handle_ctrl (vdev=0x555558fdacd0, 
vq=0x555558fe7730) at ../hw/net/virtio-net.c:1610
#3  0x0000555555e9a7d8 in virtio_queue_notify_vq (vq=0x555558fe7730) at 
../hw/virtio/virtio.c:2484
#4  0x0000555555e9dffb in virtio_queue_host_notifier_read (n=0x555558fe77a4) at 
../hw/virtio/virtio.c:3869
#5  0x000055555620329f in aio_dispatch_handler (ctx=0x555557d9f840, 
node=0x7fffdca7ba80) at ../util/aio-posix.c:373
#6  0x000055555620346f in aio_dispatch_handlers (ctx=0x555557d9f840) at 
../util/aio-posix.c:415
#7  0x00005555562034cb in aio_dispatch (ctx=0x555557d9f840) at 
../util/aio-posix.c:425
#8  0x00005555562242b5 in aio_ctx_dispatch (source=0x555557d9f840, 
callback=0x0, user_data=0x0) at ../util/async.c:361
#9  0x00007ffff6d86559 in ?? () from /usr/lib/libglib-2.0.so.0
#10 0x00007ffff6d86858 in g_main_context_dispatch () from 
/usr/lib/libglib-2.0.so.0
#11 0x0000555556225bf9 in glib_pollfds_poll () at ../util/main-loop.c:287
#12 0x0000555556225c87 in os_host_main_loop_wait (timeout=294672) at 
../util/main-loop.c:310
#13 0x0000555556225db6 in main_loop_wait (nonblocking=0) at 
../util/main-loop.c:589
#14 0x0000555555c0c1a3 in qemu_main_loop () at ../system/runstate.c:835
#15 0x000055555612bd8d in qemu_default_main (opaque=0x0) at ../system/main.c:48
#16 0x000055555612be3d in main (argc=23, argv=0x7fffffffe508) at 
../system/main.c:76

virtio_queue_notify_vq at hw/virtio/virtio.c:2484 [2] calls
vq->handle_output(vdev, vq). I see "handle_output" is a function
pointer and in this case it seems to be pointing to
virtio_net_handle_ctrl.

[...]
With x-svq=true, I see that n->mac is set by virtio_net_handle_mac()
[3] when L1 receives VIRTIO_NET_CTRL_MAC_ADDR_SET. With x-svq=false,
virtio_net_handle_mac() doesn't seem to be getting called. I haven't
understood how the MAC address is set in VirtIONet when x-svq=false.
Understanding this might help see why n->mac has different values
when x-svq is false vs when it is true.

Ok this makes sense, as x-svq=true is the one that receives the set
mac message. You should see it in L0's QEMU though, both in x-svq=on
and x-svq=off scenarios. Can you check it?

L0's QEMU seems to be receiving the "set mac" message only when L1
is launched with x-svq=true. With x-svq=off, I don't see any call
to virtio_net_handle_mac with cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET
in L0.


Ok this is interesting. Let's disable control virtqueue to start with
something simpler:
device virtio-net-pci,netdev=net0,ctrl_vq=off,...

QEMU will start complaining about features that depend on ctrl_vq,
like ctrl_rx. Let's disable all of them and check this new scenario.


I am still investigating this part. I set ctrl_vq=off and ctrl_rx=off.
I didn't get any errors as such about features that depend on ctrl_vq.
However, I did notice that after booting L2 (x-svq=true as well as
x-svq=false), no eth0 device was created. There was only a "lo" interface
in L2. An eth0 interface is present only when L1 (L0 QEMU) is booted
with ctrl_vq=on and ctrl_rx=on.


Any error messages on the nested guest's dmesg?

Oh, yes, there were error messages in the output of dmesg related to
ctrl_vq. After adding the following args, there were no error messages
in dmesg.

-device virtio-net-pci,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_mac_addr=off

I see that the eth0 interface is also created. I am able to ping L0
from L2 and vice versa as well (even with x-svq=true). This is because
n->promisc is set when these features are disabled and receive_filter() [1]
always returns 1.

Is it fixed when you set the same mac address on L0
virtio-net-pci and L1's?


I didn't have to set the same mac address in this case since promiscuous
mode seems to be getting enabled which allows pinging to work.

There is another concept that I am a little confused about. In the case
where L2 is booted with x-svq=false (and all ctrl features such as ctrl_vq,
ctrl_rx, etc. are on), I am able to ping L0 from L2. When tracing
receive_filter() in L0-QEMU, I see the values of n->mac and the destination
mac address in the ICMP packet match [2].


SVQ makes an effort to set the mac address at the beginning of
operation. The L0 interpret it as "filter out all MACs except this
one". But SVQ cannot set the mac if ctrl_mac_addr=off, so the nic
receives all packets and the guest kernel needs to filter out by
itself.

I haven't understood what n->mac refers to over here. MAC addresses are
globally unique and so the mac address of the device in L1 should be
different from that in L2.

With vDPA, they should be the same device even if they are declared in
different cmdlines or layers of virtualizations. If it were a physical
NIC, QEMU should declare the MAC of the physical NIC too.

Understood. I guess the issue with x-svq=true is that the MAC address
set in L0-QEMU's n->mac is different from the device in L2. That's why
the packets get filtered out with x-svq=true but pinging works with
x-svq=false.

There is a thread in QEMU maul list where how QEMU should influence
the control plane is discussed, and maybe it would be easier if QEMU
just checks the device's MAC and ignores cmdline. But then, that
behavior would be surprising for the rest of vhosts like vhost-kernel.
Or just emit a warning if the MAC is different than the one that the
device reports.


Got it.

But I see L0-QEMU's n->mac is set to the mac
address of the device in L2 (allowing receive_filter to accept the packet).


That's interesting, can you check further what does receive_filter and
virtio_net_receive_rcu do with gdb? As long as virtio_net_receive_rcu
flushes the packet on the receive queue, SVQ should receive it.

The control flow irrespective of the value of x-svq is the same up till
the MAC address comparison in receive_filter() [1]. For x-svq=true,
the equality check between n->mac and the packet's destination MAC address
fails and the packet is filtered out. It is not flushed to the receive
queue. With x-svq=false, this is not the case.

On 2/4/25 11:45 PM, Eugenio Perez Martin wrote:
PS: Please note that you can check packed_vq SVQ implementation
already without CVQ, as these features are totally orthogonal :).


Right. Now that I can ping with the ctrl features turned off, I think
this should take precedence. There's another issue specific to the
packed virtqueue case. It causes the kernel to crash. I have been
investigating this and the situation here looks very similar to what's
explained in Jason Wang's mail [2]. My plan of action is to apply his
changes in L2's kernel and check if that resolves the problem.

The details of the crash can be found in this mail [3].

Thanks,
Sahil

[1] https://gitlab.com/qemu-project/qemu/-/blob/master/hw/net/virtio-net.c#L1775
[2] https://lkml.iu.edu/hypermail/linux/kernel/1307.0/01455.html
[3] https://lists.nongnu.org/archive/html/qemu-devel/2024-12/msg01134.html


Reply via email to