On Tue, Feb 4, 2025 at 1:49 PM Sahil Siddiq <icegambi...@gmail.com> wrote: > > Hi, > > On 1/31/25 12:27 PM, Eugenio Perez Martin wrote: > > On Fri, Jan 31, 2025 at 6:04 AM Sahil Siddiq <icegambi...@gmail.com> wrote: > >> On 1/24/25 1:04 PM, Eugenio Perez Martin wrote: > >>> On Fri, Jan 24, 2025 at 6:47 AM Sahil Siddiq <icegambi...@gmail.com> > >>> wrote: > >>>> On 1/21/25 10:07 PM, Eugenio Perez Martin wrote: > >>>>> On Sun, Jan 19, 2025 at 7:37 AM Sahil Siddiq <icegambi...@gmail.com> > >>>>> wrote: > >>>>>> On 1/7/25 1:35 PM, Eugenio Perez Martin wrote: > >>>>>> [...] > >>>>>> Apologies for the delay in replying. It took me a while to figure > >>>>>> this out, but I have now understood why this doesn't work. L1 is > >>>>>> unable to receive messages from L0 because they get filtered out > >>>>>> by hw/net/virtio-net.c:receive_filter [1]. There's an issue with > >>>>>> the MAC addresses. > >>>>>> > >>>>>> In L0, I have: > >>>>>> > >>>>>> $ ip a show tap0 > >>>>>> 6: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel > >>>>>> state UNKNOWN group default qlen 1000 > >>>>>> link/ether d2:6d:b9:61:e1:9a brd ff:ff:ff:ff:ff:ff > >>>>>> inet 111.1.1.1/24 scope global tap0 > >>>>>> valid_lft forever preferred_lft forever > >>>>>> inet6 fe80::d06d:b9ff:fe61:e19a/64 scope link proto kernel_ll > >>>>>> valid_lft forever preferred_lft forever > >>>>>> > >>>>>> In L1: > >>>>>> > >>>>>> # ip a show eth0 > >>>>>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel > >>>>>> state UP group default qlen 1000 > >>>>>> link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff > >>>>>> altname enp0s2 > >>>>>> inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic > >>>>>> noprefixroute eth0 > >>>>>> valid_lft 83455sec preferred_lft 83455sec > >>>>>> inet6 fec0::7bd2:265e:3b8e:5acc/64 scope site dynamic > >>>>>> noprefixroute > >>>>>> valid_lft 86064sec preferred_lft 14064sec > >>>>>> inet6 fe80::50e7:5bf6:fff8:a7b0/64 scope link noprefixroute > >>>>>> valid_lft forever preferred_lft forever > >>>>>> > >>>>>> I'll call this L1-eth0. > >>>>>> > >>>>>> In L2: > >>>>>> # ip a show eth0 > >>>>>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel > >>>>>> state UP gro0 > >>>>>> link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff > >>>>>> altname enp0s7 > >>>>>> inet 111.1.1.2/24 scope global eth0 > >>>>>> valid_lft forever preferred_lft forever > >>>>>> > >>>>>> I'll call this L2-eth0. > >>>>>> > >>>>>> Apart from eth0, lo is the only other device in both L1 and L2. > >>>>>> > >>>>>> A frame that L1 receives from L0 has L2-eth0's MAC address (LSB = 57) > >>>>>> as its destination address. When booting L2 with x-svq=false, the > >>>>>> value of n->mac in VirtIONet is also L2-eth0. So, L1 accepts > >>>>>> the frames and passes them on to L2 and pinging works [2]. > >>>>>> > >>>>> > >>>>> So this behavior is interesting by itself. But L1's kernel net system > >>>>> should not receive anything. As I read it, even if it receives it, it > >>>>> should not forward the frame to L2 as it is in a different subnet. Are > >>>>> you able to read it using tcpdump on L1? > >>>> > >>>> I ran "tcpdump -i eth0" in L1. It didn't capture any of the packets > >>>> that were directed at L2 even though L2 was able to receive them. > >>>> Similarly, it didn't capture any packets that were sent from L2 to > >>>> L0. This is when L2 is launched with x-svq=false. > >>>> > >>> > >>> That's right. The virtio dataplane goes directly from L0 to L2, you > >>> should not be able to see any packets in the net of L1. > >> > >> I am a little confused here. Since vhost=off is set in L0's QEMU > >> (which is used to boot L1), I am able to inspect the packets when > >> tracing/debugging receive_filter in hw/net/virtio-net.c. [1] Does > >> this mean the dataplane from L0 to L2 passes through L0's QEMU > >> (so L0 QEMU is aware of what's going on), but bypasses L1 completely > >> so L1's kernel does not know what packets are being sent/received. > >> > > > > That's right. We're saving processing power and context switches that way > > :). > > Got it. I have understood this part. In a previous mail (also present above): > > >>>>> On Sun, Jan 19, 2025 at 7:37 AM Sahil Siddiq wrote: > >>>>>> A frame that L1 receives from L0 has L2-eth0's MAC address (LSB = 57) > >>>>>> as its destination address. When booting L2 with x-svq=false, the > >>>>>> value of n->mac in VirtIONet is also L2-eth0. So, L1 accepts > >>>>>> the frames and passes them on to L2 and pinging works [2]. > >>>>>> > > I was a little unclear in my explanation. I meant to say the frame received by > L0-QEMU (which is running L1). > > >>>> With x-svq=true, forcibly setting the LSB of n->mac to 0x57 in > >>>> receive_filter allows L2 to receive packets from L0. I added > >>>> the following line just before line 1771 [1] to check this out. > >>>> > >>>> n->mac[5] = 0x57; > >>>> > >>> > >>> That's very interesting. Let me answer all the gdb questions below and > >>> we can debug it deeper :). > >>> > >> > >> Thank you for the primer on using gdb with QEMU. I am able to debug > >> QEMU now. > >> > >>>>> Maybe we can make the scenario clearer by telling which virtio-net > >>>>> device is which with virtio_net_pci,mac=XX:... ? > >>>>> > >>>>>> However, when booting L2 with x-svq=true, n->mac is set to L1-eth0 > >>>>>> (LSB = 56) in virtio_net_handle_mac() [3]. > >>>>> > >>>>> Can you tell with gdb bt if this function is called from net or the > >>>>> SVQ subsystem? > >>>> > >> > >> It looks like the function is being called from net. > >> > >> (gdb) bt > >> #0 virtio_net_handle_mac (n=0x15622425e, cmd=85 'U', iov=0x555558865980, > >> iov_cnt=1476792840) at ../hw/net/virtio-net.c:1098 > >> #1 0x0000555555e5920b in virtio_net_handle_ctrl_iov (vdev=0x555558fdacd0, > >> in_sg=0x5555580611f8, in_num=1, out_sg=0x555558061208, > >> out_num=1) at ../hw/net/virtio-net.c:1581 > >> #2 0x0000555555e593a0 in virtio_net_handle_ctrl (vdev=0x555558fdacd0, > >> vq=0x555558fe7730) at ../hw/net/virtio-net.c:1610 > >> #3 0x0000555555e9a7d8 in virtio_queue_notify_vq (vq=0x555558fe7730) at > >> ../hw/virtio/virtio.c:2484 > >> #4 0x0000555555e9dffb in virtio_queue_host_notifier_read > >> (n=0x555558fe77a4) at ../hw/virtio/virtio.c:3869 > >> #5 0x000055555620329f in aio_dispatch_handler (ctx=0x555557d9f840, > >> node=0x7fffdca7ba80) at ../util/aio-posix.c:373 > >> #6 0x000055555620346f in aio_dispatch_handlers (ctx=0x555557d9f840) at > >> ../util/aio-posix.c:415 > >> #7 0x00005555562034cb in aio_dispatch (ctx=0x555557d9f840) at > >> ../util/aio-posix.c:425 > >> #8 0x00005555562242b5 in aio_ctx_dispatch (source=0x555557d9f840, > >> callback=0x0, user_data=0x0) at ../util/async.c:361 > >> #9 0x00007ffff6d86559 in ?? () from /usr/lib/libglib-2.0.so.0 > >> #10 0x00007ffff6d86858 in g_main_context_dispatch () from > >> /usr/lib/libglib-2.0.so.0 > >> #11 0x0000555556225bf9 in glib_pollfds_poll () at ../util/main-loop.c:287 > >> #12 0x0000555556225c87 in os_host_main_loop_wait (timeout=294672) at > >> ../util/main-loop.c:310 > >> #13 0x0000555556225db6 in main_loop_wait (nonblocking=0) at > >> ../util/main-loop.c:589 > >> #14 0x0000555555c0c1a3 in qemu_main_loop () at ../system/runstate.c:835 > >> #15 0x000055555612bd8d in qemu_default_main (opaque=0x0) at > >> ../system/main.c:48 > >> #16 0x000055555612be3d in main (argc=23, argv=0x7fffffffe508) at > >> ../system/main.c:76 > >> > >> virtio_queue_notify_vq at hw/virtio/virtio.c:2484 [2] calls > >> vq->handle_output(vdev, vq). I see "handle_output" is a function > >> pointer and in this case it seems to be pointing to > >> virtio_net_handle_ctrl. > >> > >>>>>> [...] > >>>>>> With x-svq=true, I see that n->mac is set by virtio_net_handle_mac() > >>>>>> [3] when L1 receives VIRTIO_NET_CTRL_MAC_ADDR_SET. With x-svq=false, > >>>>>> virtio_net_handle_mac() doesn't seem to be getting called. I haven't > >>>>>> understood how the MAC address is set in VirtIONet when x-svq=false. > >>>>>> Understanding this might help see why n->mac has different values > >>>>>> when x-svq is false vs when it is true. > >>>>> > >>>>> Ok this makes sense, as x-svq=true is the one that receives the set > >>>>> mac message. You should see it in L0's QEMU though, both in x-svq=on > >>>>> and x-svq=off scenarios. Can you check it? > >>>> > >>>> L0's QEMU seems to be receiving the "set mac" message only when L1 > >>>> is launched with x-svq=true. With x-svq=off, I don't see any call > >>>> to virtio_net_handle_mac with cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET > >>>> in L0. > >>>> > >>> > >>> Ok this is interesting. Let's disable control virtqueue to start with > >>> something simpler: > >>> device virtio-net-pci,netdev=net0,ctrl_vq=off,... > >>> > >>> QEMU will start complaining about features that depend on ctrl_vq, > >>> like ctrl_rx. Let's disable all of them and check this new scenario. > >>> > >> > >> I am still investigating this part. I set ctrl_vq=off and ctrl_rx=off. > >> I didn't get any errors as such about features that depend on ctrl_vq. > >> However, I did notice that after booting L2 (x-svq=true as well as > >> x-svq=false), no eth0 device was created. There was only a "lo" interface > >> in L2. An eth0 interface is present only when L1 (L0 QEMU) is booted > >> with ctrl_vq=on and ctrl_rx=on. > >> > > > > Any error messages on the nested guest's dmesg? > > Oh, yes, there were error messages in the output of dmesg related to > ctrl_vq. After adding the following args, there were no error messages > in dmesg. > > -device virtio-net-pci,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_mac_addr=off > > I see that the eth0 interface is also created. I am able to ping L0 > from L2 and vice versa as well (even with x-svq=true). This is because > n->promisc is set when these features are disabled and receive_filter() [1] > always returns 1. > > > Is it fixed when you set the same mac address on L0 > > virtio-net-pci and L1's? > > > > I didn't have to set the same mac address in this case since promiscuous > mode seems to be getting enabled which allows pinging to work. > > There is another concept that I am a little confused about. In the case > where L2 is booted with x-svq=false (and all ctrl features such as ctrl_vq, > ctrl_rx, etc. are on), I am able to ping L0 from L2. When tracing > receive_filter() in L0-QEMU, I see the values of n->mac and the destination > mac address in the ICMP packet match [2]. >
SVQ makes an effort to set the mac address at the beginning of operation. The L0 interpret it as "filter out all MACs except this one". But SVQ cannot set the mac if ctrl_mac_addr=off, so the nic receives all packets and the guest kernel needs to filter out by itself. > I haven't understood what n->mac refers to over here. MAC addresses are > globally unique and so the mac address of the device in L1 should be > different from that in L2. With vDPA, they should be the same device even if they are declared in different cmdlines or layers of virtualizations. If it were a physical NIC, QEMU should declare the MAC of the physical NIC too. There is a thread in QEMU maul list where how QEMU should influence the control plane is discussed, and maybe it would be easier if QEMU just checks the device's MAC and ignores cmdline. But then, that behavior would be surprising for the rest of vhosts like vhost-kernel. Or just emit a warning if the MAC is different than the one that the device reports. > But I see L0-QEMU's n->mac is set to the mac > address of the device in L2 (allowing receive_filter to accept the packet). > That's interesting, can you check further what does receive_filter and virtio_net_receive_rcu do with gdb? As long as virtio_net_receive_rcu flushes the packet on the receive queue, SVQ should receive it.