On Fri, 27 Apr 2018 16:41:42 +0000 Jianfeng Tan <jianfeng....@intel.com> wrote:
> Below commit introduced pthread barrier for synchronization. > But two IPC threads block on the barrier, and never wake up. > > (gdb) bt > #0 futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4) > at ../sysdeps/unix/sysv/linux/futex-internal.h:61 > #1 futex_wait_simple (private=0, expected=0, futex_word=0x7fffffffcff4) > at ../sysdeps/nptl/futex-internal.h:135 > #2 __pthread_barrier_wait (barrier=0x7fffffffcff0) at > pthread_barrier_wait.c:184 > #3 rte_thread_init (arg=0x7fffffffcfe0) > at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160 > #4 start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333 > #5 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 > > Through analysis, we find the barrier defined on the stack could be the > root cause. This patch will change to use heap memory as the barrier. > > Fixes: d651ee4919cd ("eal: set affinity for control threads") > > Cc: Olivier Matz <olivier.m...@6wind.com> > Cc: Anatoly Burakov <anatoly.bura...@intel.com> > > Signed-off-by: Jianfeng Tan <jianfeng....@intel.com> > --- > lib/librte_eal/common/eal_common_thread.c | 20 +++++++++++++------- > 1 file changed, 13 insertions(+), 7 deletions(-) > > diff --git a/lib/librte_eal/common/eal_common_thread.c > b/lib/librte_eal/common/eal_common_thread.c > index 4e75cb8..da2b84f 100644 > --- a/lib/librte_eal/common/eal_common_thread.c > +++ b/lib/librte_eal/common/eal_common_thread.c > @@ -166,17 +166,21 @@ rte_ctrl_thread_create(pthread_t *thread, const char > *name, > const pthread_attr_t *attr, > void *(*start_routine)(void *), void *arg) > { > - struct rte_thread_ctrl_params params = { > - .start_routine = start_routine, > - .arg = arg, > - }; > + struct rte_thread_ctrl_params *params; > unsigned int lcore_id; > rte_cpuset_t cpuset; > int cpu_found, ret; > > - pthread_barrier_init(¶ms.configured, NULL, 2); > + params = malloc(sizeof(*params)); > + if (!params) > + return -1; > + > + params->start_routine = start_routine; > + params->arg = arg; > > - ret = pthread_create(thread, attr, rte_thread_init, (void *)¶ms); > + pthread_barrier_init(¶ms->configured, NULL, 2); > + > + ret = pthread_create(thread, attr, rte_thread_init, (void *)params); > if (ret != 0) > return ret; > > @@ -203,12 +207,14 @@ rte_ctrl_thread_create(pthread_t *thread, const char > *name, > if (ret < 0) > goto fail; > > - pthread_barrier_wait(¶ms.configured); > + pthread_barrier_wait(¶ms->configured); > + free(params); > > return 0; > > fail: > pthread_cancel(*thread); > pthread_join(*thread, NULL); > + free(params); > return ret; > } This looks like a library bug. If there is a race on the configured barrier, then putting on heap is just moving problem. It still has bug where other thread is referring to freed memory.