On Thu, Feb 6, 2025 at 6:26 AM Sahil Siddiq <icegambi...@gmail.com> wrote: > > Hi, > > On 2/4/25 11:40 PM, Eugenio Perez Martin wrote: > > On Tue, Feb 4, 2025 at 1:49 PM Sahil Siddiq <icegambi...@gmail.com> wrote: > >> On 1/31/25 12:27 PM, Eugenio Perez Martin wrote: > >>> On Fri, Jan 31, 2025 at 6:04 AM Sahil Siddiq <icegambi...@gmail.com> > >>> wrote: > >>>> On 1/24/25 1:04 PM, Eugenio Perez Martin wrote: > >>>>> On Fri, Jan 24, 2025 at 6:47 AM Sahil Siddiq <icegambi...@gmail.com> > >>>>> wrote: > >>>>>> On 1/21/25 10:07 PM, Eugenio Perez Martin wrote: > >>>>>>> On Sun, Jan 19, 2025 at 7:37 AM Sahil Siddiq <icegambi...@gmail.com> > >>>>>>> wrote: > >>>>>>>> On 1/7/25 1:35 PM, Eugenio Perez Martin wrote: > >>>>>>>> [...] > >>>>>>>> Apologies for the delay in replying. It took me a while to figure > >>>>>>>> this out, but I have now understood why this doesn't work. L1 is > >>>>>>>> unable to receive messages from L0 because they get filtered out > >>>>>>>> by hw/net/virtio-net.c:receive_filter [1]. There's an issue with > >>>>>>>> the MAC addresses. > >>>>>>>> > >>>>>>>> In L0, I have: > >>>>>>>> > >>>>>>>> $ ip a show tap0 > >>>>>>>> 6: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel > >>>>>>>> state UNKNOWN group default qlen 1000 > >>>>>>>> link/ether d2:6d:b9:61:e1:9a brd ff:ff:ff:ff:ff:ff > >>>>>>>> inet 111.1.1.1/24 scope global tap0 > >>>>>>>> valid_lft forever preferred_lft forever > >>>>>>>> inet6 fe80::d06d:b9ff:fe61:e19a/64 scope link proto > >>>>>>>> kernel_ll > >>>>>>>> valid_lft forever preferred_lft forever > >>>>>>>> > >>>>>>>> In L1: > >>>>>>>> > >>>>>>>> # ip a show eth0 > >>>>>>>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel > >>>>>>>> state UP group default qlen 1000 > >>>>>>>> link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff > >>>>>>>> altname enp0s2 > >>>>>>>> inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic > >>>>>>>> noprefixroute eth0 > >>>>>>>> valid_lft 83455sec preferred_lft 83455sec > >>>>>>>> inet6 fec0::7bd2:265e:3b8e:5acc/64 scope site dynamic > >>>>>>>> noprefixroute > >>>>>>>> valid_lft 86064sec preferred_lft 14064sec > >>>>>>>> inet6 fe80::50e7:5bf6:fff8:a7b0/64 scope link noprefixroute > >>>>>>>> valid_lft forever preferred_lft forever > >>>>>>>> > >>>>>>>> I'll call this L1-eth0. > >>>>>>>> > >>>>>>>> In L2: > >>>>>>>> # ip a show eth0 > >>>>>>>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel > >>>>>>>> state UP gro0 > >>>>>>>> link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff > >>>>>>>> altname enp0s7 > >>>>>>>> inet 111.1.1.2/24 scope global eth0 > >>>>>>>> valid_lft forever preferred_lft forever > >>>>>>>> > >>>>>>>> I'll call this L2-eth0. > >>>>>>>> > >>>>>>>> Apart from eth0, lo is the only other device in both L1 and L2. > >>>>>>>> > >>>>>>>> A frame that L1 receives from L0 has L2-eth0's MAC address (LSB = 57) > >>>>>>>> as its destination address. When booting L2 with x-svq=false, the > >>>>>>>> value of n->mac in VirtIONet is also L2-eth0. So, L1 accepts > >>>>>>>> the frames and passes them on to L2 and pinging works [2]. > >>>>>>>> > >>>>>>> > >>>>>>> So this behavior is interesting by itself. But L1's kernel net system > >>>>>>> should not receive anything. As I read it, even if it receives it, it > >>>>>>> should not forward the frame to L2 as it is in a different subnet. Are > >>>>>>> you able to read it using tcpdump on L1? > >>>>>> > >>>>>> I ran "tcpdump -i eth0" in L1. It didn't capture any of the packets > >>>>>> that were directed at L2 even though L2 was able to receive them. > >>>>>> Similarly, it didn't capture any packets that were sent from L2 to > >>>>>> L0. This is when L2 is launched with x-svq=false. > >>>>>> [...] > >>>>>> With x-svq=true, forcibly setting the LSB of n->mac to 0x57 in > >>>>>> receive_filter allows L2 to receive packets from L0. I added > >>>>>> the following line just before line 1771 [1] to check this out. > >>>>>> > >>>>>> n->mac[5] = 0x57; > >>>>>> > >>>>> > >>>>> That's very interesting. Let me answer all the gdb questions below and > >>>>> we can debug it deeper :). > >>>>> > >>>> > >>>> Thank you for the primer on using gdb with QEMU. I am able to debug > >>>> QEMU now. > >>>> > >>>>>>> Maybe we can make the scenario clearer by telling which virtio-net > >>>>>>> device is which with virtio_net_pci,mac=XX:... ? > >>>>>>> > >>>>>>>> However, when booting L2 with x-svq=true, n->mac is set to L1-eth0 > >>>>>>>> (LSB = 56) in virtio_net_handle_mac() [3]. > >>>>>>> > >>>>>>> Can you tell with gdb bt if this function is called from net or the > >>>>>>> SVQ subsystem? > >>>>>> > >>>> > >>>> It looks like the function is being called from net. > >>>> > >>>> (gdb) bt > >>>> #0 virtio_net_handle_mac (n=0x15622425e, cmd=85 'U', > >>>> iov=0x555558865980, iov_cnt=1476792840) at ../hw/net/virtio-net.c:1098 > >>>> #1 0x0000555555e5920b in virtio_net_handle_ctrl_iov > >>>> (vdev=0x555558fdacd0, in_sg=0x5555580611f8, in_num=1, > >>>> out_sg=0x555558061208, > >>>> out_num=1) at ../hw/net/virtio-net.c:1581 > >>>> #2 0x0000555555e593a0 in virtio_net_handle_ctrl (vdev=0x555558fdacd0, > >>>> vq=0x555558fe7730) at ../hw/net/virtio-net.c:1610 > >>>> #3 0x0000555555e9a7d8 in virtio_queue_notify_vq (vq=0x555558fe7730) at > >>>> ../hw/virtio/virtio.c:2484 > >>>> #4 0x0000555555e9dffb in virtio_queue_host_notifier_read > >>>> (n=0x555558fe77a4) at ../hw/virtio/virtio.c:3869 > >>>> #5 0x000055555620329f in aio_dispatch_handler (ctx=0x555557d9f840, > >>>> node=0x7fffdca7ba80) at ../util/aio-posix.c:373 > >>>> #6 0x000055555620346f in aio_dispatch_handlers (ctx=0x555557d9f840) at > >>>> ../util/aio-posix.c:415 > >>>> #7 0x00005555562034cb in aio_dispatch (ctx=0x555557d9f840) at > >>>> ../util/aio-posix.c:425 > >>>> #8 0x00005555562242b5 in aio_ctx_dispatch (source=0x555557d9f840, > >>>> callback=0x0, user_data=0x0) at ../util/async.c:361 > >>>> #9 0x00007ffff6d86559 in ?? () from /usr/lib/libglib-2.0.so.0 > >>>> #10 0x00007ffff6d86858 in g_main_context_dispatch () from > >>>> /usr/lib/libglib-2.0.so.0 > >>>> #11 0x0000555556225bf9 in glib_pollfds_poll () at ../util/main-loop.c:287 > >>>> #12 0x0000555556225c87 in os_host_main_loop_wait (timeout=294672) at > >>>> ../util/main-loop.c:310 > >>>> #13 0x0000555556225db6 in main_loop_wait (nonblocking=0) at > >>>> ../util/main-loop.c:589 > >>>> #14 0x0000555555c0c1a3 in qemu_main_loop () at ../system/runstate.c:835 > >>>> #15 0x000055555612bd8d in qemu_default_main (opaque=0x0) at > >>>> ../system/main.c:48 > >>>> #16 0x000055555612be3d in main (argc=23, argv=0x7fffffffe508) at > >>>> ../system/main.c:76 > >>>> > >>>> virtio_queue_notify_vq at hw/virtio/virtio.c:2484 [2] calls > >>>> vq->handle_output(vdev, vq). I see "handle_output" is a function > >>>> pointer and in this case it seems to be pointing to > >>>> virtio_net_handle_ctrl. > >>>> > >>>>>>>> [...] > >>>>>>>> With x-svq=true, I see that n->mac is set by virtio_net_handle_mac() > >>>>>>>> [3] when L1 receives VIRTIO_NET_CTRL_MAC_ADDR_SET. With x-svq=false, > >>>>>>>> virtio_net_handle_mac() doesn't seem to be getting called. I haven't > >>>>>>>> understood how the MAC address is set in VirtIONet when x-svq=false. > >>>>>>>> Understanding this might help see why n->mac has different values > >>>>>>>> when x-svq is false vs when it is true. > >>>>>>> > >>>>>>> Ok this makes sense, as x-svq=true is the one that receives the set > >>>>>>> mac message. You should see it in L0's QEMU though, both in x-svq=on > >>>>>>> and x-svq=off scenarios. Can you check it? > >>>>>> > >>>>>> L0's QEMU seems to be receiving the "set mac" message only when L1 > >>>>>> is launched with x-svq=true. With x-svq=off, I don't see any call > >>>>>> to virtio_net_handle_mac with cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET > >>>>>> in L0. > >>>>>> > >>>>> > >>>>> Ok this is interesting. Let's disable control virtqueue to start with > >>>>> something simpler: > >>>>> device virtio-net-pci,netdev=net0,ctrl_vq=off,... > >>>>> > >>>>> QEMU will start complaining about features that depend on ctrl_vq, > >>>>> like ctrl_rx. Let's disable all of them and check this new scenario. > >>>>> > >>>> > >>>> I am still investigating this part. I set ctrl_vq=off and ctrl_rx=off. > >>>> I didn't get any errors as such about features that depend on ctrl_vq. > >>>> However, I did notice that after booting L2 (x-svq=true as well as > >>>> x-svq=false), no eth0 device was created. There was only a "lo" interface > >>>> in L2. An eth0 interface is present only when L1 (L0 QEMU) is booted > >>>> with ctrl_vq=on and ctrl_rx=on. > >>>> > >>> > >>> Any error messages on the nested guest's dmesg? > >> > >> Oh, yes, there were error messages in the output of dmesg related to > >> ctrl_vq. After adding the following args, there were no error messages > >> in dmesg. > >> > >> -device > >> virtio-net-pci,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_mac_addr=off > >> > >> I see that the eth0 interface is also created. I am able to ping L0 > >> from L2 and vice versa as well (even with x-svq=true). This is because > >> n->promisc is set when these features are disabled and receive_filter() [1] > >> always returns 1. > >> > >>> Is it fixed when you set the same mac address on L0 > >>> virtio-net-pci and L1's? > >>> > >> > >> I didn't have to set the same mac address in this case since promiscuous > >> mode seems to be getting enabled which allows pinging to work. > >> > >> There is another concept that I am a little confused about. In the case > >> where L2 is booted with x-svq=false (and all ctrl features such as ctrl_vq, > >> ctrl_rx, etc. are on), I am able to ping L0 from L2. When tracing > >> receive_filter() in L0-QEMU, I see the values of n->mac and the destination > >> mac address in the ICMP packet match [2]. > >> > > > > SVQ makes an effort to set the mac address at the beginning of > > operation. The L0 interpret it as "filter out all MACs except this > > one". But SVQ cannot set the mac if ctrl_mac_addr=off, so the nic > > receives all packets and the guest kernel needs to filter out by > > itself. > > > >> I haven't understood what n->mac refers to over here. MAC addresses are > >> globally unique and so the mac address of the device in L1 should be > >> different from that in L2. > > > > With vDPA, they should be the same device even if they are declared in > > different cmdlines or layers of virtualizations. If it were a physical > > NIC, QEMU should declare the MAC of the physical NIC too. > > Understood. I guess the issue with x-svq=true is that the MAC address > set in L0-QEMU's n->mac is different from the device in L2. That's why > the packets get filtered out with x-svq=true but pinging works with > x-svq=false. >
Right! > > There is a thread in QEMU maul list where how QEMU should influence > > the control plane is discussed, and maybe it would be easier if QEMU > > just checks the device's MAC and ignores cmdline. But then, that > > behavior would be surprising for the rest of vhosts like vhost-kernel. > > Or just emit a warning if the MAC is different than the one that the > > device reports. > > > > Got it. > > >> But I see L0-QEMU's n->mac is set to the mac > >> address of the device in L2 (allowing receive_filter to accept the packet). > >> > > > > That's interesting, can you check further what does receive_filter and > > virtio_net_receive_rcu do with gdb? As long as virtio_net_receive_rcu > > flushes the packet on the receive queue, SVQ should receive it. > > > The control flow irrespective of the value of x-svq is the same up till > the MAC address comparison in receive_filter() [1]. For x-svq=true, > the equality check between n->mac and the packet's destination MAC address > fails and the packet is filtered out. It is not flushed to the receive > queue. With x-svq=false, this is not the case. > > On 2/4/25 11:45 PM, Eugenio Perez Martin wrote: > > PS: Please note that you can check packed_vq SVQ implementation > > already without CVQ, as these features are totally orthogonal :). > > > > Right. Now that I can ping with the ctrl features turned off, I think > this should take precedence. There's another issue specific to the > packed virtqueue case. It causes the kernel to crash. I have been > investigating this and the situation here looks very similar to what's > explained in Jason Wang's mail [2]. My plan of action is to apply his > changes in L2's kernel and check if that resolves the problem. > > The details of the crash can be found in this mail [3]. > If you're testing this series without changes, I think that is caused by not implementing the packed version of vhost_svq_get_buf. https://lists.nongnu.org/archive/html/qemu-devel/2024-12/msg01902.html