On Fri, 27 Apr 2018 16:41:42 +0000
Jianfeng Tan <jianfeng....@intel.com> wrote:

> Below commit introduced pthread barrier for synchronization.
> But two IPC threads block on the barrier, and never wake up.
> 
>   (gdb) bt
>   #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
>       at ../sysdeps/unix/sysv/linux/futex-internal.h:61
>   #1  futex_wait_simple (private=0, expected=0, futex_word=0x7fffffffcff4)
>       at ../sysdeps/nptl/futex-internal.h:135
>   #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at 
> pthread_barrier_wait.c:184
>   #3  rte_thread_init (arg=0x7fffffffcfe0)
>       at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
>   #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
>   #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> Through analysis, we find the barrier defined on the stack could be the
> root cause. This patch will change to use heap memory as the barrier.
> 
> Fixes: d651ee4919cd ("eal: set affinity for control threads")
> 
> Cc: Olivier Matz <olivier.m...@6wind.com>
> Cc: Anatoly Burakov <anatoly.bura...@intel.com>
> 
> Signed-off-by: Jianfeng Tan <jianfeng....@intel.com>
> ---
>  lib/librte_eal/common/eal_common_thread.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_thread.c 
> b/lib/librte_eal/common/eal_common_thread.c
> index 4e75cb8..da2b84f 100644
> --- a/lib/librte_eal/common/eal_common_thread.c
> +++ b/lib/librte_eal/common/eal_common_thread.c
> @@ -166,17 +166,21 @@ rte_ctrl_thread_create(pthread_t *thread, const char 
> *name,
>               const pthread_attr_t *attr,
>               void *(*start_routine)(void *), void *arg)
>  {
> -     struct rte_thread_ctrl_params params = {
> -             .start_routine = start_routine,
> -             .arg = arg,
> -     };
> +     struct rte_thread_ctrl_params *params;
>       unsigned int lcore_id;
>       rte_cpuset_t cpuset;
>       int cpu_found, ret;
>  
> -     pthread_barrier_init(&params.configured, NULL, 2);
> +     params = malloc(sizeof(*params));
> +     if (!params)
> +             return -1;
> +
> +     params->start_routine = start_routine;
> +     params->arg = arg;
>  
> -     ret = pthread_create(thread, attr, rte_thread_init, (void *)&params);
> +     pthread_barrier_init(&params->configured, NULL, 2);
> +
> +     ret = pthread_create(thread, attr, rte_thread_init, (void *)params);
>       if (ret != 0)
>               return ret;
>  
> @@ -203,12 +207,14 @@ rte_ctrl_thread_create(pthread_t *thread, const char 
> *name,
>       if (ret < 0)
>               goto fail;
>  
> -     pthread_barrier_wait(&params.configured);
> +     pthread_barrier_wait(&params->configured);
> +     free(params);
>  
>       return 0;
>  
>  fail:
>       pthread_cancel(*thread);
>       pthread_join(*thread, NULL);
> +     free(params);
>       return ret;
>  }

This looks like a library bug. If there is a race on the configured barrier, 
then
putting on heap is just moving problem. It still has bug where other thread is 
referring to freed memory.

Reply via email to