On 8/1/22 13:57, Claudio Fontana wrote: > On 7/30/22 18:23, Claudio Fontana wrote: >> On 7/30/22 18:17, Claudio Fontana wrote: >>> Hello all, >>> >>> with the latest DPDK, openvswitch and qemu >>> >>> DPDK tag v22.07 >>> openvswitch tag v2.17.1 >>> qemu v7.1-git 22.07.2022 >>> >>> and a DPDK setup which involves also an ubuntu guest with DPDK 16.11 >>> test-pmd application (but also verified with DPDK 19.x), >>> with an external traffic generator to cause some load, >>> >>> I am able to cause a segfault in OVS (ovs-vswitchd) inside the DPDK >>> libraries by doing (from the guest): >>> >>> bind the device, start testpmd, >>> SIGKILL of testpmd, >>> immediately restart testpmd, >>> rinse and repeat. >>> >>> Once every few restarts, the following segfault happens (may take anything >>> from a few seconds to minutes): >>> >>> >>> Thread 153 "pmd-c88/id:150" received signal SIGSEGV, Segmentation fault. >>> [Switching to Thread 0x7f64e5e6b700 (LWP 141373)] >>> rte_mov128blocks (n=2048, src=0xc <error: Cannot access memory at address >>> 0xc>, dst=0x150da4480 "h\005\312❇\377\377\377\377\377\377\b") at >>> ../lib/eal/x86/include/rte_memcpy.h:384 >>> 384 ../lib/eal/x86/include/rte_memcpy.h: No such file or directory. >>> (gdb) bt >>> #0 rte_mov128blocks (n=2048, src=0xc <error: Cannot access memory at >>> address 0xc>, >>> dst=0x150da4480 "h\005\312❇\377\377\377\377\377\377\b") at >>> ../lib/eal/x86/include/rte_memcpy.h:384 >>> #1 rte_memcpy_generic (n=2048, src=0xc, dst=0x150da4480) at >>> ../lib/eal/x86/include/rte_memcpy.h:484 >>> #2 rte_memcpy (n=2048, src=0xc, dst=<optimized out>) at >>> ../lib/eal/x86/include/rte_memcpy.h:851 >>> #3 sync_fill_seg (to_desc=false, cpy_len=2048, buf_iova=<optimized out>, >>> buf_addr=12, mbuf_offset=0, m=0x150da4140, >>> vq=0x2200400680, dev=0x2200d3d740) at ../lib/vhost/virtio_net.c:1119 >>> #4 desc_to_mbuf (is_async=false, slot_idx=0, legacy_ol_flags=true, >>> mbuf_pool=0x17fe7df00, m=0x150da4140, nr_vec=<optimized out>, >>> buf_vec=0x7f64e5e67ca0, vq=0x2200400680, dev=0x2200d3d740) at >>> ../lib/vhost/virtio_net.c:2747 >>> #5 virtio_dev_tx_split (legacy_ol_flags=true, count=<optimized out>, >>> count@entry=0, pkts=pkts@entry=0x0, >>> mbuf_pool=mbuf_pool@entry=0x150da4140, vq=vq@entry=0xe5e67d34, >>> dev=dev@entry=0x7f64e5e694d0) at ../lib/vhost/virtio_net.c:2943 >>> #6 virtio_dev_tx_split_legacy (dev=dev@entry=0x2200d3d740, >>> vq=vq@entry=0x2200400680, mbuf_pool=mbuf_pool@entry=0x17fe7df00, >>> pkts=pkts@entry=0x7f64e5e69600, count=count@entry=32) at >>> ../lib/vhost/virtio_net.c:2979 >>> #7 0x00007f676fea0fef in rte_vhost_dequeue_burst (vid=vid@entry=0, >>> queue_id=queue_id@entry=1, mbuf_pool=0x17fe7df00, >>> pkts=pkts@entry=0x7f64e5e69600, count=count@entry=32) at >>> ../lib/vhost/virtio_net.c:3331 >>> #8 0x00007f6772005a62 in netdev_dpdk_vhost_rxq_recv (rxq=<optimized out>, >>> batch=0x7f64e5e695f0, qfill=0x0) >>> at ../lib/netdev-dpdk.c:2393 >>> #9 0x00007f6771f38116 in netdev_rxq_recv (rx=<optimized out>, >>> batch=batch@entry=0x7f64e5e695f0, qfill=<optimized out>) >>> at ../lib/netdev.c:727 >>> #10 0x00007f6771f03d96 in dp_netdev_process_rxq_port >>> (pmd=pmd@entry=0x7f64e5e6c010, rxq=0x254d730, port_no=2) >>> at ../lib/dpif-netdev.c:5317 >>> #11 0x00007f6771f04239 in pmd_thread_main (f_=<optimized out>) at >>> ../lib/dpif-netdev.c:6945 >>> #12 0x00007f6771f92aff in ovsthread_wrapper (aux_=<optimized out>) at >>> ../lib/ovs-thread.c:422 >>> #13 0x00007f6771c1b6ea in start_thread () from /lib64/libpthread.so.0 >>> #14 0x00007f6771933a8f in clone () from /lib64/libc.so.6 >>> >>> When run in gdb as shown above, ovs-vswitchd on the host gets a SIGSEGV and >>> drops to gdb as shown above, >>> so as a result QEMU stops when trying to read a response from ovs as such: >>> >>> 0 0x00007f0a093991e9 in poll () from target:/lib64/libc.so.6 >>> #1 0x00007f0a0b06c9a9 in ?? () from target:/usr/lib64/libglib-2.0.so.0 >>> #2 0x00007f0a0b06ccf2 in g_main_loop_run () from >>> target:/usr/lib64/libglib-2.0.so.0 >>> #3 0x0000561a5cd04747 in vhost_user_read (dev=dev@entry=0x561a5e640df0, >>> msg=msg@entry=0x7f09ff7fd160) >>> at ../hw/virtio/vhost-user.c:406 >>> #4 0x0000561a5cd04c7e in vhost_user_get_vring_base (dev=0x561a5e640df0, >>> ring=0x7f09ff7fd428) >>> at ../hw/virtio/vhost-user.c:1261 >>> #5 0x0000561a5cd0043f in vhost_virtqueue_stop >>> (dev=dev@entry=0x561a5e640df0, vdev=vdev@entry=0x561a5f78ae50, >>> vq=0x561a5e641070, idx=0) at ../hw/virtio/vhost.c:1216 >>> #6 0x0000561a5cd034fa in vhost_dev_stop (hdev=hdev@entry=0x561a5e640df0, >>> vdev=vdev@entry=0x561a5f78ae50) >>> at ../hw/virtio/vhost.c:1872 >>> #7 0x0000561a5cb623fa in vhost_net_stop_one (net=0x561a5e640df0, >>> dev=dev@entry=0x561a5f78ae50) >>> at ../hw/net/vhost_net.c:315 >>> #8 0x0000561a5cb6295e in vhost_net_stop (dev=dev@entry=0x561a5f78ae50, >>> ncs=0x561a5f808970, >>> data_queue_pairs=data_queue_pairs@entry=4, cvq=cvq@entry=0) at >>> ../hw/net/vhost_net.c:427 >>> #9 0x0000561a5cccef79 in virtio_net_vhost_status (status=<optimized out>, >>> n=0x561a5f78ae50) >>> at ../hw/net/virtio-net.c:298 >>> #10 virtio_net_set_status (vdev=0x561a5f78ae50, status=0 '\000') at >>> ../hw/net/virtio-net.c:372 >>> #11 0x0000561a5ccfb36b in virtio_set_status >>> (vdev=vdev@entry=0x561a5f78ae50, val=val@entry=0 '\000') >>> at ../hw/virtio/virtio.c:1997 >>> #12 0x0000561a5cbfff29 in virtio_pci_common_write (opaque=0x561a5f782a90, >>> addr=<optimized out>, val=0, >>> size=<optimized out>) at ../hw/virtio/virtio-pci.c:1294 >>> #13 0x0000561a5cd25fbf in memory_region_write_accessor (mr=0x561a5f7835c0, >>> addr=20, value=<optimized out>, size=1, >>> shift=<optimized out>, mask=<optimized out>, attrs=...) at >>> ../softmmu/memory.c:492 >>> #14 0x0000561a5cd22950 in access_with_adjusted_size (addr=addr@entry=20, >>> value=value@entry=0x7f09ff7fd6f8, >>> size=size@entry=1, access_size_min=<optimized out>, >>> access_size_max=<optimized out>, >>> access_fn=access_fn@entry=0x561a5cd25f6d >>> <memory_region_write_accessor>, mr=0x561a5f7835c0, attrs=...) >>> >>> Some additional info about the setup: >>> >>> Host is running SUSE Linux Enterprise Server 15SP3, with DPDK, openvswitch >>> and QEMU packages replaced with latest upstream releases. >>> >>> Guest is running the following libvirt VM: >>> >>> <domain type='kvm' id='11'> >>> <name>ubuntu20.04-3</name> >>> <uuid>971953a5-bd24-4856-a117-87c791a09580</uuid> >>> <metadata> >>> <libosinfo:libosinfo >>> xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0"> >>> <libosinfo:os id="http://ubuntu.com/ubuntu/20.04"/> >>> </libosinfo:libosinfo> >>> </metadata> >>> <memory unit='KiB'>4194304</memory> >>> <currentMemory unit='KiB'>4194304</currentMemory> >>> <memoryBacking> >>> <hugepages> >>> <page size='1048576' unit='KiB' nodeset='0'/> >>> </hugepages> >>> </memoryBacking> >>> <vcpu placement='static'>5</vcpu> >>> <resource> >>> <partition>/machine</partition> >>> </resource> >>> <os> >>> <type arch='x86_64' machine='pc-q35-5.2'>hvm</type> >>> <boot dev='hd'/> >>> </os> >>> <features> >>> <acpi/> >>> <apic/> >>> <vmport state='off'/> >>> </features> >>> <cpu mode='custom' match='exact' check='full'> >>> <model fallback='forbid'>IvyBridge-IBRS</model> >>> <vendor>Intel</vendor> >>> <topology sockets='1' dies='1' cores='5' threads='1'/> >>> <feature policy='require' name='ss'/> >>> <feature policy='require' name='vmx'/> >>> <feature policy='require' name='pcid'/> >>> <feature policy='require' name='hypervisor'/> >>> <feature policy='require' name='arat'/> >>> <feature policy='require' name='tsc_adjust'/> >>> <feature policy='require' name='umip'/> >>> <feature policy='require' name='md-clear'/> >>> <feature policy='require' name='stibp'/> >>> <feature policy='require' name='arch-capabilities'/> >>> <feature policy='require' name='ssbd'/> >>> <feature policy='require' name='avx2'/> >>> <feature policy='require' name='xsaveopt'/> >>> <feature policy='require' name='pdpe1gb'/> >>> <feature policy='require' name='skip-l1dfl-vmentry'/> >>> <feature policy='require' name='pschange-mc-no'/> >>> <numa> >>> <cell id='0' cpus='0-4' memory='4194304' unit='KiB' >>> memAccess='shared'/> >>> </numa> >>> </cpu> >>> <clock offset='utc'> >>> <timer name='rtc' tickpolicy='catchup'/> >>> <timer name='pit' tickpolicy='delay'/> >>> <timer name='hpet' present='no'/> >>> </clock> >>> <on_poweroff>destroy</on_poweroff> >>> <on_reboot>restart</on_reboot> >>> <on_crash>destroy</on_crash> >>> <pm> >>> <suspend-to-mem enabled='no'/> >>> <suspend-to-disk enabled='no'/> >>> </pm> >>> <devices> >>> <emulator>/usr/bin/qemu-system-x86_64</emulator> >>> <disk type='file' device='disk'> >>> <driver name='qemu' type='qcow2'/> >>> <source file='/var/lib/libvirt/images/ubuntu20.04-2-1.qcow2' >>> index='1'/> >>> <backingStore/> >>> <target dev='vda' bus='virtio'/> >>> <alias name='virtio-disk0'/> >>> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' >>> function='0x0'/> >>> </disk> >>> <controller type='usb' index='0' model='ich9-ehci1'> >>> <alias name='usb'/> >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' >>> function='0x7'/> >>> </controller> >>> <controller type='usb' index='0' model='ich9-uhci1'> >>> <alias name='usb'/> >>> <master startport='0'/> >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' >>> function='0x0' multifunction='on'/> >>> </controller> >>> <controller type='usb' index='0' model='ich9-uhci2'> >>> <alias name='usb'/> >>> <master startport='2'/> >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' >>> function='0x1'/> >>> </controller> >>> <controller type='usb' index='0' model='ich9-uhci3'> >>> <alias name='usb'/> >>> <master startport='4'/> >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' >>> function='0x2'/> >>> </controller> >>> <controller type='sata' index='0'> >>> <alias name='ide'/> >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' >>> function='0x2'/> >>> </controller> >>> <controller type='pci' index='0' model='pcie-root'> >>> <alias name='pcie.0'/> >>> </controller> >>> <controller type='pci' index='1' model='pcie-root-port'> >>> <model name='pcie-root-port'/> >>> <target chassis='1' port='0x10'/> >>> <alias name='pci.1'/> >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' >>> function='0x0' multifunction='on'/> >>> </controller> >>> <controller type='pci' index='2' model='pcie-root-port'> >>> <model name='pcie-root-port'/> >>> <target chassis='2' port='0x11'/> >>> <alias name='pci.2'/> >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' >>> function='0x1'/> >>> </controller> >>> <controller type='pci' index='3' model='pcie-root-port'> >>> <model name='pcie-root-port'/> >>> <target chassis='3' port='0x12'/> >>> <alias name='pci.3'/> >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' >>> function='0x2'/> >>> </controller> >>> <controller type='pci' index='4' model='pcie-root-port'> >>> <model name='pcie-root-port'/> >>> <target chassis='4' port='0x13'/> >>> <alias name='pci.4'/> >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' >>> function='0x3'/> >>> </controller> >>> <controller type='pci' index='5' model='pcie-root-port'> >>> <model name='pcie-root-port'/> >>> <target chassis='5' port='0x14'/> >>> <alias name='pci.5'/> >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' >>> function='0x4'/> >>> </controller> >>> <controller type='pci' index='6' model='pcie-root-port'> >>> <model name='pcie-root-port'/> >>> <target chassis='6' port='0x15'/> >>> <alias name='pci.6'/> >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' >>> function='0x5'/> >>> </controller> >>> <controller type='pci' index='7' model='pcie-root-port'> >>> <model name='pcie-root-port'/> >>> <target chassis='7' port='0x16'/> >>> <alias name='pci.7'/> >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' >>> function='0x6'/> >>> </controller> >>> <controller type='virtio-serial' index='0'> >>> <alias name='virtio-serial0'/> >>> <address type='pci' domain='0x0000' bus='0x02' slot='0x00' >>> function='0x0'/> >>> </controller> >>> <interface type='vhostuser'> >>> <mac address='00:00:00:00:00:01'/> >>> <source type='unix' path='/tmp/dpdkvhostuser0' mode='server'/> >>> <target dev='dpdkvhostuser0'/> >>> <model type='virtio'/> >>> <driver name='vhost' queues='4' rx_queue_size='1024' >>> tx_queue_size='1024'> >>> <host mrg_rxbuf='off'/> >>> </driver> >>> <alias name='net0'/> >>> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' >>> function='0x0'/> >>> </interface> >>> <interface type='network'> >>> <mac address='52:54:00:db:af:d7'/> >>> <source network='default' >>> portid='66f8c203-dc5d-4f18-94e2-a7a2dc75bec0' bridge='virbr0'/> >>> <target dev='vnet9'/> >>> <model type='virtio'/> >>> <alias name='net1'/> >>> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' >>> function='0x0'/> >>> </interface> >>> <serial type='pty'> >>> <source path='/dev/pts/3'/> >>> <target type='isa-serial' port='0'> >>> <model name='isa-serial'/> >>> </target> >>> <alias name='serial0'/> >>> </serial> >>> <console type='pty' tty='/dev/pts/3'> >>> <source path='/dev/pts/3'/> >>> <target type='serial' port='0'/> >>> <alias name='serial0'/> >>> </console> >>> <channel type='unix'> >>> <source mode='bind' >>> path='/var/lib/libvirt/qemu/channel/target/domain-11-ubuntu20.04-3/org.qemu.guest_agent.0'/> >>> <target type='virtio' name='org.qemu.guest_agent.0' >>> state='disconnected'/> >>> <alias name='channel0'/> >>> <address type='virtio-serial' controller='0' bus='0' port='1'/> >>> </channel> >>> <input type='tablet' bus='usb'> >>> <alias name='input0'/> >>> <address type='usb' bus='0' port='1'/> >>> </input> >>> <input type='mouse' bus='ps2'> >>> <alias name='input1'/> >>> </input> >>> <input type='keyboard' bus='ps2'> >>> <alias name='input2'/> >>> </input> >>> <graphics type='vnc' port='5901' autoport='yes' listen='127.0.0.1'> >>> <listen type='address' address='127.0.0.1'/> >>> </graphics> >>> <sound model='ich9'> >>> <alias name='sound0'/> >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' >>> function='0x0'/> >>> </sound> >>> <video> >>> <model type='vga' vram='16384' heads='1' primary='yes'/> >>> <alias name='video0'/> >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' >>> function='0x0'/> >>> </video> >>> <memballoon model='virtio'> >>> <alias name='balloon0'/> >>> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' >>> function='0x0'/> >>> </memballoon> >>> <rng model='virtio'> >>> <backend model='random'>/dev/urandom</backend> >>> <alias name='rng0'/> >>> <address type='pci' domain='0x0000' bus='0x05' slot='0x00' >>> function='0x0'/> >>> </rng> >>> </devices> >>> </domain> >>> >>> --- >>> >>> The guest binds the device as such: >>> >>> # modprobe uio_pci_generic >>> # dpdk-stable-16.11.11/tools/dpdk-devbind.py -b uio_pci_generic 0000:07:00.0 >>> >>> Then runs the following two scripts in parallel to create, SIGKILL, and >>> immediately restart testpmd applications: >>> >>> -------------------------------------- >>> start_testpmd.sh: >>> >>> #! /bin/bash >>> >>> while true ; do >>> >>> /home/zhl/dpdk-stable-16.11.11/x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd >>> --log-level=8 -c 0x1e -n 4 --socket-mem 512 -- -i --nb-cores=3 >>> --port-topology=chained --disable-hw-vlan --forward-mode=macswap >>> --auto-start --rxq=4 --txq=4 --rxd=512 --txd=512 --burst=32 >>> done >>> >>> -------------------------------------- >>> kill_testpmd.sh: >>> >>> #! /bin/bash >>> >>> while true ; do >>> sleep 2 >>> kill -9 `pgrep -x testpmd` >>> done >>> >>> -------------------------------------- >>> >>> After some initial investigation, I was able to only find workarounds, as I >>> am not familiar enough with the code involved. >>> >>> The src pointer in the backtrace has the value 0x0c, which is derived from >>> the NULL buf_addr as base address, plus the dev->vhost_hlen, >>> as per lib/vhost/virtio_net.c:2726: >>> buf_offset = dev->vhost_hlen; >>> buf_avail = buf_vec[vec_idx].buf_len - dev->vhost_hlen; >>> >>> One thing I noticed while debugging OVS/DPDK, is that in the same file, in >>> function >>> virtio_dev_tx_split, >>> as the code accesses the virtqueue and fills the struct buf_vector >>> buf_vec[BUF_VECTOR_MAX] from the descriptors, >>> >>> using fill_vec_buf_split, >>> >>> the descriptors seem to be "corrupted" (actually zero at least in the >>> upstream code) as the function fill_vec_buf_split accesses them. >>> In particular, in upstream code (it's different with older versions of OVS >>> and DPDK), >>> >>> I see vq->desc[idx] containing all zeroes: {.addr = 0, .len = 0, .flags = >>> 0, .next = 0 }. >>> >>> I do not understand why this is, and hope someone can help figure out why >>> these descriptors end up in this state? >>> >>> My current tentative workaround follows, but by no means I am sure of where >>> the actual root cause is, >>> this just seems to gets be around the segfault for now: >>> >>> diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c >>> index 35fa4670fd..098c735dbe 100644 >>> --- a/lib/vhost/virtio_net.c >>> +++ b/lib/vhost/virtio_net.c >>> @@ -722,6 +722,13 @@ fill_vec_buf_split(struct virtio_net *dev, struct >>> vhost_virtqueue *vq, >>> >>> *desc_chain_head = idx; >>> >>> + /* XXX claudio: why zero? */ >>> + if (unlikely(vq->desc[idx].addr == 0 || vq->desc[idx].len == 0)) { >>> + VHOST_LOG_DATA(dev->ifname, ERR, "claudio: skipping broken vq >>> descriptor: addr=%llu, len=%u", >>> + vq->desc[idx].addr, vq->desc[idx].len); >>> + goto out; >>> + } >>> + >>> if (vq->desc[idx].flags & VRING_DESC_F_INDIRECT) { >>> dlen = vq->desc[idx].len; >>> nr_descs = dlen / sizeof(struct vring_desc); >>> @@ -773,6 +780,7 @@ fill_vec_buf_split(struct virtio_net *dev, struct >>> vhost_virtqueue *vq, >>> idx = descs[idx].next; >>> } >>> >>> +out: >>> *desc_chain_len = len; >>> *vec_idx = vec_id; >>> >>> -------- >> >> Just to add, this is not a solution, just sharing some observations. >> Sometimes the vq->desc[idx].addr has other invalid values (for example, the >> value 1). >> >> Not sure if specific checks for valid ranges would make sense, but likely >> the real issue is somewhere else... >> >> Thanks, >> >> Claudio >> > > I sent patches to address this, seems there is a missing check after calling > fill_vec_buf_split to verify that nr_vecs is not 0. > > If it is 0, virtio_net attempts to access buf_vec[0], which contains > uninitialized stack memory. > > The series is called "vhost fixes for OVS SIGSEGV in PMD", sent it already > yesterday but did not show up in the mailing list for some reason. > > Thanks, > > CLaudio
Hmm, likely this is better addressed in general in desc_to_mbuf and mbuf_to_desc, making them fail with nr_desc == 0? Maxime, Jiayu Hu, do you have an opinion on this? I saw you touched last time the checks in mbuf_to_desc, could you explain why the error check there is: if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1)) return -1; ? And also, why it happens after: uint32_t vec_idx = 0; /* ... */ buf_addr = buf_vec[vec_idx].buf_addr; buf_iova = buf_vec[vec_idx].buf_iova; buf_len = buf_vec[vec_idx].buf_len; ? In my understanding, if nr_vec==0, then buf_vec[0] can contain uninitialized garbage. Shouldn't we check first if nr_vec is 0, and in this case bail out immediately with a return -1; ? Thanks, Claudio