On 4/28/2018 9:21 AM, Stephen Hemminger wrote:
On Fri, 27 Apr 2018 21:52:26 +0200
Thomas Monjalon <tho...@monjalon.net> wrote:
27/04/2018 19:45, Shreyansh Jain:
From: Stephen Hemminger [mailto:step...@networkplumber.org]
Shreyansh Jain <shreyansh.j...@nxp.com> wrote:
From: Jianfeng Tan
Below commit introduced pthread barrier for synchronization.
But two IPC threads block on the barrier, and never wake up.
(gdb) bt
#0 futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
at ../sysdeps/unix/sysv/linux/futex-internal.h:61
#1 futex_wait_simple (private=0, expected=0,
futex_word=0x7fffffffcff4)
at ../sysdeps/nptl/futex-internal.h:135
#2 __pthread_barrier_wait (barrier=0x7fffffffcff0) at
pthread_barrier_wait.c:184
#3 rte_thread_init (arg=0x7fffffffcfe0)
at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
#4 start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
#5 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Through analysis, we find the barrier defined on the stack
could be the root cause. This patch will change to use heap
memory as the barrier.
Fixes: d651ee4919cd ("eal: set affinity for control threads")
Cc: Olivier Matz <olivier.m...@6wind.com>
Cc: Anatoly Burakov <anatoly.bura...@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng....@intel.com>
Though I have seen Stephen's comment on this (possibly a library
bug), this at least fixes an issue which was dogging dpaa and dpaa2 -
generating bus errors and futex errors with variation in core masks
provided to applications.
Thanks a lot for this.
Acked-by: Shreyansh Jain <shreyansh.j...@nxp.com>
Applied, thanks Jianfeng.
Could you verify there is not a use after free by using valgrind or
some library that poisons memory on free.
I will probably do that soon - but for the time being I don't want
this issue to block the dpaa/dpaa2 for RC1 - these drivers were
completely unusable without this patch.
Please Shreyansh, continue the analysis of this bug.
Thanks
I think the patch needs to change.
The attributes need be either global (or leak and never free).
The glibc source for init keeps the pointer to the attributes.
Did not follow why we need to add attr here. Besides, init only uses
attr to decide futex type (private or shared); seems that it does not
keep the pointer.
So I cannot understand why we need to add a non-null attr parameter.
Thanks,
Jianfeng
static const struct pthread_barrierattr default_barrierattr =
{
.pshared = PTHREAD_PROCESS_PRIVATE
};
int
__pthread_barrier_init (pthread_barrier_t *barrier,
const pthread_barrierattr_t *attr, unsigned int count)
{
struct pthread_barrier *ibarrier;
/* XXX EINVAL is not specified by POSIX as a possible error code for COUNT
being too large. See pthread_barrier_wait for the reason for the
comparison with BARRIER_IN_THRESHOLD. */
if (__glibc_unlikely (count == 0 || count >= BARRIER_IN_THRESHOLD))
return EINVAL;
const struct pthread_barrierattr *iattr
= (attr != NULL
? (struct pthread_barrierattr *) attr
: &default_barrierattr);
ibarrier = (struct pthread_barrier *) barrier;
/* Initialize the individual fields. */
ibarrier->in = 0;
ibarrier->out = 0;
ibarrier->count = count;
ibarrier->current_round = 0;
ibarrier->shared = (iattr->pshared == PTHREAD_PROCESS_PRIVATE
? FUTEX_PRIVATE : FUTEX_SHARED);
return 0;
}
weak_alias (__pthread_barrier_init, pthread_barrier_init)