On 4/28/2018 9:24 AM, Stephen Hemminger wrote:
On Fri, 27 Apr 2018 21:52:26 +0200
Thomas Monjalon <tho...@monjalon.net> wrote:

27/04/2018 19:45, Shreyansh Jain:
From: Stephen Hemminger [mailto:step...@networkplumber.org]
Shreyansh Jain <shreyansh.j...@nxp.com> wrote:
From: Jianfeng Tan
Below commit introduced pthread barrier for synchronization.
But two IPC threads block on the barrier, and never wake up.

   (gdb) bt
   #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
       at ../sysdeps/unix/sysv/linux/futex-internal.h:61
   #1  futex_wait_simple (private=0, expected=0,
futex_word=0x7fffffffcff4)
       at ../sysdeps/nptl/futex-internal.h:135
   #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at
pthread_barrier_wait.c:184
   #3  rte_thread_init (arg=0x7fffffffcfe0)
       at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
   #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
   #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Through analysis, we find the barrier defined on the stack
could be the root cause. This patch will change to use heap
memory as the barrier.

Fixes: d651ee4919cd ("eal: set affinity for control threads")

Cc: Olivier Matz <olivier.m...@6wind.com>
Cc: Anatoly Burakov <anatoly.bura...@intel.com>

Signed-off-by: Jianfeng Tan <jianfeng....@intel.com>
Though I have seen Stephen's comment on this (possibly a library
bug), this at least fixes an issue which was dogging dpaa and dpaa2 -
generating bus errors and futex errors with variation in core masks
provided to applications.
Thanks a lot for this.

Acked-by: Shreyansh Jain <shreyansh.j...@nxp.com>
Applied, thanks Jianfeng.

Could you verify there is not a use after free by using valgrind or
some library that poisons memory on free.
I will probably do that soon - but for the time being I don't want
this issue to block the dpaa/dpaa2 for RC1 - these drivers were
completely unusable without this patch.
Please Shreyansh, continue the analysis of this bug.
Thanks


The pthread_barrier should also be destroyed when it is no longer needed.

I tried this could also kick the sleeping thread; but due to "The effect of subsequent use of the barrier is undefined", I did not use that way.

Anyway, I agree that destroy() shall be called for completeness.

Thanks,
Jianfeng

Reply via email to