On 19/11/2020 11:21, Stokes, Ian wrote: >> Hi, >> We are seeing a ovs-vswitchd service crash with segfault in >> the >> librte_vhost library when a DPDK application within a guest VM is stopped. >> >> We are using OVS 2.11.1 on CentOS 7.6 (3.10.0-1062 Linux >> kernel) with >> DPDK 18.11.2. > > Hi, > > Is there a reason you are using OVS 2.11.1 and DPDK 18.11.2? These are quite > old. > > As a first step I would recommend using the latest of these branches that > have been validated with by the OVS community. > > As of now this would be OVS 2.11.4 and DPDK 18.11.9 to check if the issue is > still present there my suspicion is that this could be an issue resolved in > the DPDK library since 18.11.2. >
+1, there's 58 commits in the vhost library on 18.11 branch since 18.11.2, so it might be already fixed. 18.11.10 is the latest release, while below is in from 18.11.7. $ git log --oneline v18.11.2..HEAD . | grep crash 90b5ba739f vhost: fix crash on port deletion If you are planning to continue to use 18.11 for a while, I think you will want to test the 18.11.11 Release Candidate that will be available in a few weeks. It is the last planned 18.11 release, so any issues you find *after* it is released won't be fixed. Kevin. > Regards > Ian > >> >> We are using OVS-DPDK on the host and the guest VM is running >> a DPDK >> application. With some traffic, if the application service within the VM is >> restarted, then OVS crashes. >> >> This crash is not seen if the guest VM is restarted (instead >> of stopping >> the application within the VM). >> >> The crash trackback (attached below) points to the >> rte_memcpy_generic() function in rte_memcpy.h. It looks like the crash occurs >> when vhost is trying to dequeue the packets from the guest VM (as the >> application in the guest VM has stopped and the huge pages are returned to >> the >> guest kernel). >> >> We have tried enabling iommu in ovs by setting >> "other_config:vhost-iommu-support=true" and enabling iommu in qemu using >> the following configuration in the guest domain XML: >> <iommu model='intel'> >> <driver intremap='on'/> >> </iommu> >> With iommu enabled ovs-vswitchd still crashes when guest VM >> restarts >> the network service. >> >> Is this a known problem? Anyone else seen a crash like this? >> How can >> we protect the ovs-vswitchd from crashing when a guest VM restarts the >> network application or service? >> >> Thanks >> Alex >> ------------------------------------------------------------------------ >> >> Log: >> Oct 7 19:54:16 Branch81-Bravo kernel: [2245909.596635] pmd16[25721]: >> segfault at 7f4d1d733000 ip 00007f4d2ae5d066 sp 00007f4d1ce65618 error 4 in >> librte_vhost.so.4[7f4d2ae52000+1a000] >> Oct 7 19:54:19 Branch81-Bravo systemd[1]: ovs-vswitchd.service: main process >> exited, code=killed, status=11/SEGV >> >> Environment: >> CentOs 7.6.1810 >> openvswitch-2.11.1-1.el7.centos.x86_64 >> openvswitch-kmod-2.11.1-1.el7.centos.x86_64 >> dpdk-18.11-2.el7.centos.x86_64 >> 3.10.0-1062.4.1.el7.x86_64 >> qemu-kvm-ev-2.12.0-18.el7.centos_6.1.1 >> >> Core dump trace: >> (gdb) bt >> #-1 0x00007ffff205602e in rte_memcpy_generic (dst=<optimized out>, >> src=0x7fffcef3607c, n=<optimized out>) >> at /usr/src/debug/dpdk-18.11/x86_64-native-linuxapp- >> gcc/include/rte_memcpy.h:793 >> Backtrace stopped: Cannot access memory at address 0x7ffff20558f0 >> >> (gdb) list *0x00007ffff205602e >> 0x7ffff205602e is in rte_memcpy_generic (/usr/src/debug/dpdk-18.11/x86_64- >> native-linuxapp-gcc/include/rte_memcpy.h:793). >> 788 } >> 789 >> 790 /** >> 791 * For copy with unaligned load >> 792 */ >> 793 MOVEUNALIGNED_LEFT47(dst, src, n, srcofs); >> 794 >> 795 /** >> 796 * Copy whatever left >> 797 */ >> >> (gdb) list *0x00007ffff205c192 >> 0x7ffff205c192 is in rte_vhost_dequeue_burst (/usr/src/debug/dpdk- >> 18.11/lib/librte_vhost/virtio_net.c:1192). >> 1187 * In zero copy mode, one mbuf can only reference data >> 1188 * for one or partial of one desc buff. >> 1189 */ >> 1190 mbuf_avail = cpy_len; >> 1191 } else { >> 1192 if (likely(cpy_len > MAX_BATCH_LEN || >> 1193 vq->batch_copy_nb_elems >= vq->size || >> 1194 (hdr && cur == m))) { >> 1195 rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, >> 1196 mbuf_offset), >> (gdb) >> >> _______________________________________________ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >