Hello Luca, > > This commit introduced a regression on arm64, causing a deadlock. > lcores_autotest gets stuck and never terminates: > > [ 1077s] EAL: Detected CPU lcores: 4 > [ 1077s] EAL: Detected NUMA nodes: 1 > [ 1077s] EAL: Detected shared linkage of DPDK > [ 1077s] EAL: Multi-process socket /tmp/dpdk/rte/mp_socket > [ 1077s] EAL: Selected IOVA mode 'VA' > [ 1077s] APP: HPET is not enabled, using TSC as default timer > [ 1077s] RTE>>lcores_autotest > [ 1127s] DPDK:fast-tests / lcores_autotest time out (After 50.0 seconds) > > This is 100% reproducible when running the fast tests suite > after a package build on OBS. Reverting it reliably fixes the > issue. > > This reverts commit b28c6196b132d1f25cb8c1bf781520fc41556b3a. > > Signed-off-by: Luca Boccassi <luca.bocca...@gmail.com> > --- > v2: add forgotten signed-off-by > > I have bisected this long standing issue and identified the commit > that introduced it. If anybody can provide a different fix that would > be better, but if it's not possible to find another solution, it would > be good to revert it until it can be found, to resolve the regression.
Thanks for tracking this down. There is one issue with reverting: iirc, it reintroduces a race / double-free. Could you share a backtrace when hitting this deadlock? On my side, I am not able to catch it neither on x86 nor in a ARM vm I borrowed. I built dpdk manually in a Debian 12 container, trying to mimick OBS cflags & friends. # rm -rf build-debian; CC='ccache gcc' meson setup build-debian -Dmachine=default -Dbuildtype=plain -Ddefault_library=shared -Dc_args='-O2 -fstack-protector-strong -Wformat -Werror=format-security -Werror -Wdate-time -D_FORTIFY_SOURCE=2' && ninja -C build-debian && meson test -C build-debian --suite fast-tests --verbose -t 5 ... 36/81 DPDK:fast-tests / lcores_autotest RUNNING >>> LD_LIBRARY_PATH=/root/dpdk/build-debian/lib:/root/dpdk/build-debian/drivers >>> MALLOC_PERTURB_=90 DPDK_TEST=lcores_autotest >>> /root/dpdk/build-debian/app/dpdk-test --no-huge -m 2048 -d >>> /root/dpdk/build-debian/drivers ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― EAL: Detected CPU lcores: 3 EAL: Detected NUMA nodes: 1 EAL: Detected shared linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' VIRTIO_INIT: eth_virtio_pci_init(): Failed to init PCI device PCI_BUS: Requested device 0000:01:00.0 cannot be used APP: HPET is not enabled, using TSC as default timer RTE>>lcores_autotest EAL threads count: 3, RTE_MAX_LCORE=256 lcore 0, socket 0, role RTE, cpuset 0 lcore 1, socket 0, role RTE, cpuset 1 lcore 2, socket 0, role RTE, cpuset 2 non-EAL threads count: 253 Warning: could not register new thread (this might be expected during this test), reason Cannot allocate memory non-EAL threads count: 254 Warning: could not register new thread (this might be expected during this test), reason Cannot allocate memory lcore 0, socket 0, role RTE, cpuset 0 lcore 1, socket 0, role RTE, cpuset 1 lcore 2, socket 0, role RTE, cpuset 2 lcore 3, socket 0, role NON_EAL, cpuset 0 lcore 0, socket 0, role RTE, cpuset 0 lcore 1, socket 0, role RTE, cpuset 1 lcore 2, socket 0, role RTE, cpuset 2 Control thread running successfully Test OK RTE>>―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― 36/81 DPDK:fast-tests / lcores_autotest OK 1.87s This vm runs on: # lspcu Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 3 On-line CPU(s) list: 0-2 Vendor ID: ARM BIOS Vendor ID: QEMU Model name: Neoverse-N1 BIOS Model name: virt-rhel8.6.0 CPU @ 2.0GHz ... -- David Marchand