On Tue, Feb 4, 2025 at 7:10 PM Eugenio Perez Martin <epere...@redhat.com> wrote: > > On Tue, Feb 4, 2025 at 1:49 PM Sahil Siddiq <icegambi...@gmail.com> wrote: > > > > Hi, > > > > On 1/31/25 12:27 PM, Eugenio Perez Martin wrote: > > > On Fri, Jan 31, 2025 at 6:04 AM Sahil Siddiq <icegambi...@gmail.com> > > > wrote: > > >> On 1/24/25 1:04 PM, Eugenio Perez Martin wrote: > > >>> On Fri, Jan 24, 2025 at 6:47 AM Sahil Siddiq <icegambi...@gmail.com> > > >>> wrote: > > >>>> On 1/21/25 10:07 PM, Eugenio Perez Martin wrote: > > >>>>> On Sun, Jan 19, 2025 at 7:37 AM Sahil Siddiq <icegambi...@gmail.com> > > >>>>> wrote: > > >>>>>> On 1/7/25 1:35 PM, Eugenio Perez Martin wrote: > > >>>>>> [...] > > >>>>>> Apologies for the delay in replying. It took me a while to figure > > >>>>>> this out, but I have now understood why this doesn't work. L1 is > > >>>>>> unable to receive messages from L0 because they get filtered out > > >>>>>> by hw/net/virtio-net.c:receive_filter [1]. There's an issue with > > >>>>>> the MAC addresses. > > >>>>>> > > >>>>>> In L0, I have: > > >>>>>> > > >>>>>> $ ip a show tap0 > > >>>>>> 6: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel > > >>>>>> state UNKNOWN group default qlen 1000 > > >>>>>> link/ether d2:6d:b9:61:e1:9a brd ff:ff:ff:ff:ff:ff > > >>>>>> inet 111.1.1.1/24 scope global tap0 > > >>>>>> valid_lft forever preferred_lft forever > > >>>>>> inet6 fe80::d06d:b9ff:fe61:e19a/64 scope link proto kernel_ll > > >>>>>> valid_lft forever preferred_lft forever > > >>>>>> > > >>>>>> In L1: > > >>>>>> > > >>>>>> # ip a show eth0 > > >>>>>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel > > >>>>>> state UP group default qlen 1000 > > >>>>>> link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff > > >>>>>> altname enp0s2 > > >>>>>> inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic > > >>>>>> noprefixroute eth0 > > >>>>>> valid_lft 83455sec preferred_lft 83455sec > > >>>>>> inet6 fec0::7bd2:265e:3b8e:5acc/64 scope site dynamic > > >>>>>> noprefixroute > > >>>>>> valid_lft 86064sec preferred_lft 14064sec > > >>>>>> inet6 fe80::50e7:5bf6:fff8:a7b0/64 scope link noprefixroute > > >>>>>> valid_lft forever preferred_lft forever > > >>>>>> > > >>>>>> I'll call this L1-eth0. > > >>>>>> > > >>>>>> In L2: > > >>>>>> # ip a show eth0 > > >>>>>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel > > >>>>>> state UP gro0 > > >>>>>> link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff > > >>>>>> altname enp0s7 > > >>>>>> inet 111.1.1.2/24 scope global eth0 > > >>>>>> valid_lft forever preferred_lft forever > > >>>>>> > > >>>>>> I'll call this L2-eth0. > > >>>>>> > > >>>>>> Apart from eth0, lo is the only other device in both L1 and L2. > > >>>>>> > > >>>>>> A frame that L1 receives from L0 has L2-eth0's MAC address (LSB = 57) > > >>>>>> as its destination address. When booting L2 with x-svq=false, the > > >>>>>> value of n->mac in VirtIONet is also L2-eth0. So, L1 accepts > > >>>>>> the frames and passes them on to L2 and pinging works [2]. > > >>>>>> > > >>>>> > > >>>>> So this behavior is interesting by itself. But L1's kernel net system > > >>>>> should not receive anything. As I read it, even if it receives it, it > > >>>>> should not forward the frame to L2 as it is in a different subnet. Are > > >>>>> you able to read it using tcpdump on L1? > > >>>> > > >>>> I ran "tcpdump -i eth0" in L1. It didn't capture any of the packets > > >>>> that were directed at L2 even though L2 was able to receive them. > > >>>> Similarly, it didn't capture any packets that were sent from L2 to > > >>>> L0. This is when L2 is launched with x-svq=false. > > >>>> > > >>> > > >>> That's right. The virtio dataplane goes directly from L0 to L2, you > > >>> should not be able to see any packets in the net of L1. > > >> > > >> I am a little confused here. Since vhost=off is set in L0's QEMU > > >> (which is used to boot L1), I am able to inspect the packets when > > >> tracing/debugging receive_filter in hw/net/virtio-net.c. [1] Does > > >> this mean the dataplane from L0 to L2 passes through L0's QEMU > > >> (so L0 QEMU is aware of what's going on), but bypasses L1 completely > > >> so L1's kernel does not know what packets are being sent/received. > > >> > > > > > > That's right. We're saving processing power and context switches that way > > > :). > > > > Got it. I have understood this part. In a previous mail (also present > > above): > > > > >>>>> On Sun, Jan 19, 2025 at 7:37 AM Sahil Siddiq wrote: > > >>>>>> A frame that L1 receives from L0 has L2-eth0's MAC address (LSB = 57) > > >>>>>> as its destination address. When booting L2 with x-svq=false, the > > >>>>>> value of n->mac in VirtIONet is also L2-eth0. So, L1 accepts > > >>>>>> the frames and passes them on to L2 and pinging works [2]. > > >>>>>> > > > > I was a little unclear in my explanation. I meant to say the frame received > > by > > L0-QEMU (which is running L1). > > > > >>>> With x-svq=true, forcibly setting the LSB of n->mac to 0x57 in > > >>>> receive_filter allows L2 to receive packets from L0. I added > > >>>> the following line just before line 1771 [1] to check this out. > > >>>> > > >>>> n->mac[5] = 0x57; > > >>>> > > >>> > > >>> That's very interesting. Let me answer all the gdb questions below and > > >>> we can debug it deeper :). > > >>> > > >> > > >> Thank you for the primer on using gdb with QEMU. I am able to debug > > >> QEMU now. > > >> > > >>>>> Maybe we can make the scenario clearer by telling which virtio-net > > >>>>> device is which with virtio_net_pci,mac=XX:... ? > > >>>>> > > >>>>>> However, when booting L2 with x-svq=true, n->mac is set to L1-eth0 > > >>>>>> (LSB = 56) in virtio_net_handle_mac() [3]. > > >>>>> > > >>>>> Can you tell with gdb bt if this function is called from net or the > > >>>>> SVQ subsystem? > > >>>> > > >> > > >> It looks like the function is being called from net. > > >> > > >> (gdb) bt > > >> #0 virtio_net_handle_mac (n=0x15622425e, cmd=85 'U', > > >> iov=0x555558865980, iov_cnt=1476792840) at ../hw/net/virtio-net.c:1098 > > >> #1 0x0000555555e5920b in virtio_net_handle_ctrl_iov > > >> (vdev=0x555558fdacd0, in_sg=0x5555580611f8, in_num=1, > > >> out_sg=0x555558061208, > > >> out_num=1) at ../hw/net/virtio-net.c:1581 > > >> #2 0x0000555555e593a0 in virtio_net_handle_ctrl (vdev=0x555558fdacd0, > > >> vq=0x555558fe7730) at ../hw/net/virtio-net.c:1610 > > >> #3 0x0000555555e9a7d8 in virtio_queue_notify_vq (vq=0x555558fe7730) at > > >> ../hw/virtio/virtio.c:2484 > > >> #4 0x0000555555e9dffb in virtio_queue_host_notifier_read > > >> (n=0x555558fe77a4) at ../hw/virtio/virtio.c:3869 > > >> #5 0x000055555620329f in aio_dispatch_handler (ctx=0x555557d9f840, > > >> node=0x7fffdca7ba80) at ../util/aio-posix.c:373 > > >> #6 0x000055555620346f in aio_dispatch_handlers (ctx=0x555557d9f840) at > > >> ../util/aio-posix.c:415 > > >> #7 0x00005555562034cb in aio_dispatch (ctx=0x555557d9f840) at > > >> ../util/aio-posix.c:425 > > >> #8 0x00005555562242b5 in aio_ctx_dispatch (source=0x555557d9f840, > > >> callback=0x0, user_data=0x0) at ../util/async.c:361 > > >> #9 0x00007ffff6d86559 in ?? () from /usr/lib/libglib-2.0.so.0 > > >> #10 0x00007ffff6d86858 in g_main_context_dispatch () from > > >> /usr/lib/libglib-2.0.so.0 > > >> #11 0x0000555556225bf9 in glib_pollfds_poll () at ../util/main-loop.c:287 > > >> #12 0x0000555556225c87 in os_host_main_loop_wait (timeout=294672) at > > >> ../util/main-loop.c:310 > > >> #13 0x0000555556225db6 in main_loop_wait (nonblocking=0) at > > >> ../util/main-loop.c:589 > > >> #14 0x0000555555c0c1a3 in qemu_main_loop () at ../system/runstate.c:835 > > >> #15 0x000055555612bd8d in qemu_default_main (opaque=0x0) at > > >> ../system/main.c:48 > > >> #16 0x000055555612be3d in main (argc=23, argv=0x7fffffffe508) at > > >> ../system/main.c:76 > > >> > > >> virtio_queue_notify_vq at hw/virtio/virtio.c:2484 [2] calls > > >> vq->handle_output(vdev, vq). I see "handle_output" is a function > > >> pointer and in this case it seems to be pointing to > > >> virtio_net_handle_ctrl. > > >> > > >>>>>> [...] > > >>>>>> With x-svq=true, I see that n->mac is set by virtio_net_handle_mac() > > >>>>>> [3] when L1 receives VIRTIO_NET_CTRL_MAC_ADDR_SET. With x-svq=false, > > >>>>>> virtio_net_handle_mac() doesn't seem to be getting called. I haven't > > >>>>>> understood how the MAC address is set in VirtIONet when x-svq=false. > > >>>>>> Understanding this might help see why n->mac has different values > > >>>>>> when x-svq is false vs when it is true. > > >>>>> > > >>>>> Ok this makes sense, as x-svq=true is the one that receives the set > > >>>>> mac message. You should see it in L0's QEMU though, both in x-svq=on > > >>>>> and x-svq=off scenarios. Can you check it? > > >>>> > > >>>> L0's QEMU seems to be receiving the "set mac" message only when L1 > > >>>> is launched with x-svq=true. With x-svq=off, I don't see any call > > >>>> to virtio_net_handle_mac with cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET > > >>>> in L0. > > >>>> > > >>> > > >>> Ok this is interesting. Let's disable control virtqueue to start with > > >>> something simpler: > > >>> device virtio-net-pci,netdev=net0,ctrl_vq=off,... > > >>> > > >>> QEMU will start complaining about features that depend on ctrl_vq, > > >>> like ctrl_rx. Let's disable all of them and check this new scenario. > > >>> > > >> > > >> I am still investigating this part. I set ctrl_vq=off and ctrl_rx=off. > > >> I didn't get any errors as such about features that depend on ctrl_vq. > > >> However, I did notice that after booting L2 (x-svq=true as well as > > >> x-svq=false), no eth0 device was created. There was only a "lo" interface > > >> in L2. An eth0 interface is present only when L1 (L0 QEMU) is booted > > >> with ctrl_vq=on and ctrl_rx=on. > > >> > > > > > > Any error messages on the nested guest's dmesg? > > > > Oh, yes, there were error messages in the output of dmesg related to > > ctrl_vq. After adding the following args, there were no error messages > > in dmesg. > > > > -device > > virtio-net-pci,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_mac_addr=off > > > > I see that the eth0 interface is also created. I am able to ping L0 > > from L2 and vice versa as well (even with x-svq=true). This is because > > n->promisc is set when these features are disabled and receive_filter() [1] > > always returns 1. > > > > > Is it fixed when you set the same mac address on L0 > > > virtio-net-pci and L1's? > > > > > > > I didn't have to set the same mac address in this case since promiscuous > > mode seems to be getting enabled which allows pinging to work. > > > > There is another concept that I am a little confused about. In the case > > where L2 is booted with x-svq=false (and all ctrl features such as ctrl_vq, > > ctrl_rx, etc. are on), I am able to ping L0 from L2. When tracing > > receive_filter() in L0-QEMU, I see the values of n->mac and the destination > > mac address in the ICMP packet match [2]. > > > > SVQ makes an effort to set the mac address at the beginning of > operation. The L0 interpret it as "filter out all MACs except this > one". But SVQ cannot set the mac if ctrl_mac_addr=off, so the nic > receives all packets and the guest kernel needs to filter out by > itself. > > > I haven't understood what n->mac refers to over here. MAC addresses are > > globally unique and so the mac address of the device in L1 should be > > different from that in L2. > > With vDPA, they should be the same device even if they are declared in > different cmdlines or layers of virtualizations. If it were a physical > NIC, QEMU should declare the MAC of the physical NIC too. > > There is a thread in QEMU maul list where how QEMU should influence > the control plane is discussed, and maybe it would be easier if QEMU > just checks the device's MAC and ignores cmdline. But then, that > behavior would be surprising for the rest of vhosts like vhost-kernel. > Or just emit a warning if the MAC is different than the one that the > device reports. > > > > But I see L0-QEMU's n->mac is set to the mac > > address of the device in L2 (allowing receive_filter to accept the packet). > > > > That's interesting, can you check further what does receive_filter and > virtio_net_receive_rcu do with gdb? As long as virtio_net_receive_rcu > flushes the packet on the receive queue, SVQ should receive it.
PS: Please note that you can check packed_vq SVQ implementation already without CVQ, as these features are totally orthogonal :).