On 4/28/2018 9:24 AM, Stephen Hemminger wrote:
On Fri, 27 Apr 2018 21:52:26 +0200
Thomas Monjalon <tho...@monjalon.net> wrote:
27/04/2018 19:45, Shreyansh Jain:
From: Stephen Hemminger [mailto:step...@networkplumber.org]
Shreyansh Jain <shreyansh.j...@nxp.com> wrote:
From: Jianfeng Tan
Below commit introduced pthread barrier for synchronization.
But two IPC threads block on the barrier, and never wake up.
(gdb) bt
#0 futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
at ../sysdeps/unix/sysv/linux/futex-internal.h:61
#1 futex_wait_simple (private=0, expected=0,
futex_word=0x7fffffffcff4)
at ../sysdeps/nptl/futex-internal.h:135
#2 __pthread_barrier_wait (barrier=0x7fffffffcff0) at
pthread_barrier_wait.c:184
#3 rte_thread_init (arg=0x7fffffffcfe0)
at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
#4 start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
#5 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Through analysis, we find the barrier defined on the stack
could be the root cause. This patch will change to use heap
memory as the barrier.
Fixes: d651ee4919cd ("eal: set affinity for control threads")
Cc: Olivier Matz <olivier.m...@6wind.com>
Cc: Anatoly Burakov <anatoly.bura...@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng....@intel.com>
Though I have seen Stephen's comment on this (possibly a library
bug), this at least fixes an issue which was dogging dpaa and dpaa2 -
generating bus errors and futex errors with variation in core masks
provided to applications.
Thanks a lot for this.
Acked-by: Shreyansh Jain <shreyansh.j...@nxp.com>
Applied, thanks Jianfeng.
Could you verify there is not a use after free by using valgrind or
some library that poisons memory on free.
I will probably do that soon - but for the time being I don't want
this issue to block the dpaa/dpaa2 for RC1 - these drivers were
completely unusable without this patch.
Please Shreyansh, continue the analysis of this bug.
Thanks
The pthread_barrier should also be destroyed when it is no longer needed.
I tried this could also kick the sleeping thread; but due to "The effect
of subsequent use of the barrier is undefined", I did not use that way.
Anyway, I agree that destroy() shall be called for completeness.
Thanks,
Jianfeng