On Wed, Jul 15, 2020 at 3:41 PM Stanislav Fomichev <s...@google.com> wrote:
>
> Andrii reported that sockopt_inherit occasionally hangs up on 5.5 kernel [0].
> This can happen if server_thread runs faster than the main thread.
> In that case, pthread_cond_wait will wait forever because
> pthread_cond_signal was executed before the main thread was blocking.
> Let's move pthread_mutex_lock up a bit to make sure server_thread
> runs strictly after the main thread goes to sleep.
>
> (Not sure why this is 5.5 specific, maybe scheduling is less
> deterministic? But I was able to confirm that it does indeed
> happen in a VM.)
>
> [0] 
> https://lore.kernel.org/bpf/CAEf4BzY0-bVNHmCkMFPgObs=isuayg-dfzgdy7qwykmm7rm...@mail.gmail.com/
>
> Reported-by: Andrii Nakryiko <andr...@fb.com>
> Signed-off-by: Stanislav Fomichev <s...@google.com>
> ---

Great, thanks for figuring this out! Hopefully this is it.

Acked-by: Andrii Nakryiko <andr...@fb.com>

>  tools/testing/selftests/bpf/prog_tests/sockopt_inherit.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_inherit.c 
> b/tools/testing/selftests/bpf/prog_tests/sockopt_inherit.c
> index 8547ecbdc61f..ec281b0363b8 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sockopt_inherit.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockopt_inherit.c
> @@ -193,11 +193,10 @@ static void run_test(int cgroup_fd)
>         if (CHECK_FAIL(server_fd < 0))
>                 goto close_bpf_object;
>
> +       pthread_mutex_lock(&server_started_mtx);
>         if (CHECK_FAIL(pthread_create(&tid, NULL, server_thread,
>                                       (void *)&server_fd)))
>                 goto close_server_fd;
> -
> -       pthread_mutex_lock(&server_started_mtx);
>         pthread_cond_wait(&server_started, &server_started_mtx);
>         pthread_mutex_unlock(&server_started_mtx);
>
> --
> 2.27.0.389.gc38d7665816-goog
>

Reply via email to