On Tue, Oct 8, 2019 at 7:06 PM Aaron Conole <acon...@redhat.com> wrote: > > Ruifeng Wang <ruifeng.w...@arm.com> writes: > > > Distributor and worker threads rely on data structs in cache line > > for synchronization. The shared data structs were not protected. > > This caused deadlock issue on weaker memory ordering platforms as > > aarch64. > > Fix this issue by adding memory barriers to ensure synchronization > > among cores. > > > > Bugzilla ID: 342 > > Fixes: 775003ad2f96 ("distributor: add new burst-capable library") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Ruifeng Wang <ruifeng.w...@arm.com> > > Reviewed-by: Gavin Hu <gavin...@arm.com> > > --- > > I see a failure in the distributor_autotest (on one of the builds): > > 64/82 DPDK:fast-tests / distributor_autotest FAIL 0.37 s (exit status > 255 or signal 127 SIGinvalid) > > --- command --- > > DPDK_TEST='distributor_autotest' > /home/travis/build/ovsrobot/dpdk/build/app/test/dpdk-test -l 0-1 > --file-prefix=distributor_autotest > > --- stdout --- > > EAL: Probing VFIO support... > > APP: HPET is not enabled, using TSC as default timer > > RTE>>distributor_autotest > > === Basic distributor sanity tests === > > Worker 0 handled 32 packets > > Sanity test with all zero hashes done. > > Worker 0 handled 32 packets > > Sanity test with non-zero hashes done > > === testing big burst (single) === > > Sanity test of returned packets done > > === Sanity test with mbuf alloc/free (single) === > > Sanity test with mbuf alloc/free passed > > Too few cores to run worker shutdown test > > === Basic distributor sanity tests === > > Worker 0 handled 32 packets > > Sanity test with all zero hashes done. > > Worker 0 handled 32 packets > > Sanity test with non-zero hashes done > > === testing big burst (burst) === > > Sanity test of returned packets done > > === Sanity test with mbuf alloc/free (burst) === > > Line 326: Packet count is incorrect, 1048568, expected 1048576 > > Test Failed > > RTE>> > > --- stderr --- > > EAL: Detected 2 lcore(s) > > EAL: Detected 1 NUMA nodes > > EAL: Multi-process socket /var/run/dpdk/distributor_autotest/mp_socket > > EAL: Selected IOVA mode 'PA' > > EAL: No available hugepages reported in hugepages-1048576kB > > ------- > > Not sure how to help debug further. I'll re-start the job to see if > it 'clears' up - but I guess there may be a delicate synchronization > somewhere that needs to be accounted.
Idem, and with the same loop I used before, it can be caught quickly. # time (log=/tmp/$$.log; while true; do echo distributor_autotest |taskset -c 0-1 ./build-gcc-static/app/test/dpdk-test --log-level *:8 -l 0-1 >$log 2>&1; grep -q 'Test OK' $log || break; done; cat $log; rm -f $log) [snip] RTE>>distributor_autotest EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 2MB EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: alloc_pages_on_heap(): couldn't allocate physically contiguous space EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 8MB === Basic distributor sanity tests === Worker 0 handled 32 packets Sanity test with all zero hashes done. Worker 0 handled 32 packets Sanity test with non-zero hashes done === testing big burst (single) === Sanity test of returned packets done === Sanity test with mbuf alloc/free (single) === Sanity test with mbuf alloc/free passed Too few cores to run worker shutdown test === Basic distributor sanity tests === Worker 0 handled 32 packets Sanity test with all zero hashes done. Worker 0 handled 32 packets Sanity test with non-zero hashes done === testing big burst (burst) === Sanity test of returned packets done === Sanity test with mbuf alloc/free (burst) === Line 326: Packet count is incorrect, 1048568, expected 1048576 Test Failed RTE>> real 0m36.668s user 1m7.293s sys 0m1.560s Could be worth running this loop on all tests? (not talking about the CI, it would be a manual effort to catch lurking issues). -- David Marchand