https://bugs.dpdk.org/show_bug.cgi?id=316
Bug ID: 316 Summary: livelock causes librte_distributor unit test to hang Product: DPDK Version: 19.08 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: Normal Component: other Assignee: dev@dpdk.org Reporter: msant...@redhat.com Target Milestone: --- Issue first encountered in Travis CI. The meson unit test distributor_autotest randomly times out. Normally this test finishes in less than 1/2 a second, so running to 10 seconds and timing out is a big jump in run time. I was able to reproduce by running: `while sudo sh -c "echo 'distributor_autotest' | ./build/app/test/dpdk-test"; do :; done` It runs a couple of times fine showing output and showing progress, but then at some point after a couple of seconds it just stops - no longer getting any output. It just sits there with no further output. I let it sit there for a whole minute and nothing happens. So I attach gdb to try to figure out what is happening. One thread seems to be stuck on a while loop, see lib/librte_distributor/rte_distributor.c:310. I looked at the assembly code (layout asm, ni) and I saw these four lines below (which correspond to the while loop) being executed repeatedly and indefinitely. It looks like this thread is waiting for the variable bufptr64[0] to change state. 0xa064d0 <release+32> pause 0xa064d2 <release+34> mov 0x3840(%rdx),%rax 0xa064d9 <release+41> test $0x1,%al 0xa064db <release+43> je 0xa064d0 <release+32> While the first thread is waiting on bufptr64[0] to change state, there is another thread that is also stuck on another while loop on lib/librte_distributor/rte_distributor.c:53. It seems that this thread is stuck waiting for retptr64 to change state. Corresponding assembly being executed indefinitely: 0xa06de0 <rte_distributor_request_pkt_v1705+592> mov 0x38c0(%r8),%rax 0xa06de7 <rte_distributor_request_pkt_v1705+599> test $0x1,%al 0xa06de9 <rte_distributor_request_pkt_v1705+601> je 0xa06bbd <rte_distributor_request_pkt_v1705+45> 0xa06def <rte_distributor_request_pkt_v1705+607> nop 0xa06df0 <rte_distributor_request_pkt_v1705+608> pause 0xa06df2 <rte_distributor_request_pkt_v1705+610> rdtsc 0xa06df4 <rte_distributor_request_pkt_v1705+612> mov %rdx,%r10 0xa06df7 <rte_distributor_request_pkt_v1705+615> shl $0x20,%r10 0xa06dfb <rte_distributor_request_pkt_v1705+619> mov %eax,%eax 0xa06dfd <rte_distributor_request_pkt_v1705+621> or %r10,%rax 0xa06e00 <rte_distributor_request_pkt_v1705+624> lea 0x64(%rax),%r10 0xa06e04 <rte_distributor_request_pkt_v1705+628> jmp 0xa06e12 <rte_distributor_request_pkt_v1705+642> 0xa06e06 <rte_distributor_request_pkt_v1705+630> nopw %cs:0x0(%rax,%rax,1) 0xa06e10 <rte_distributor_request_pkt_v1705+640> pause 0xa06e12 <rte_distributor_request_pkt_v1705+642> rdtsc 0xa06e14 <rte_distributor_request_pkt_v1705+644> shl $0x20,%rdx 0xa06e18 <rte_distributor_request_pkt_v1705+648> mov %eax,%eax 0xa06e1a <rte_distributor_request_pkt_v1705+650> or %rdx,%rax 0xa06e1d <rte_distributor_request_pkt_v1705+653> cmp %rax,%r10 0xa06e20 <rte_distributor_request_pkt_v1705+656> ja 0xa06e10 <rte_distributor_request_pkt_v1705+640> 0xa06e22 <rte_distributor_request_pkt_v1705+658> jmp 0xa06de0 <rte_distributor_request_pkt_v1705+592> My guess is that these threads are interdependent, so one thread is waiting for the other thread to change the state of the control variable. I can't say for sure if this is what is happening or why the these variables don't change state, so I would like ask someone who is more familiar with this particular code to take a look -- You are receiving this mail because: You are the assignee for the bug.