On Wed, Jul 15, 2020 at 3:41 PM Stanislav Fomichev <s...@google.com> wrote: > > Andrii reported that sockopt_inherit occasionally hangs up on 5.5 kernel [0]. > This can happen if server_thread runs faster than the main thread. > In that case, pthread_cond_wait will wait forever because > pthread_cond_signal was executed before the main thread was blocking. > Let's move pthread_mutex_lock up a bit to make sure server_thread > runs strictly after the main thread goes to sleep. > > (Not sure why this is 5.5 specific, maybe scheduling is less > deterministic? But I was able to confirm that it does indeed > happen in a VM.) > > [0] > https://lore.kernel.org/bpf/CAEf4BzY0-bVNHmCkMFPgObs=isuayg-dfzgdy7qwykmm7rm...@mail.gmail.com/ > > Reported-by: Andrii Nakryiko <andr...@fb.com> > Signed-off-by: Stanislav Fomichev <s...@google.com> > ---
Great, thanks for figuring this out! Hopefully this is it. Acked-by: Andrii Nakryiko <andr...@fb.com> > tools/testing/selftests/bpf/prog_tests/sockopt_inherit.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_inherit.c > b/tools/testing/selftests/bpf/prog_tests/sockopt_inherit.c > index 8547ecbdc61f..ec281b0363b8 100644 > --- a/tools/testing/selftests/bpf/prog_tests/sockopt_inherit.c > +++ b/tools/testing/selftests/bpf/prog_tests/sockopt_inherit.c > @@ -193,11 +193,10 @@ static void run_test(int cgroup_fd) > if (CHECK_FAIL(server_fd < 0)) > goto close_bpf_object; > > + pthread_mutex_lock(&server_started_mtx); > if (CHECK_FAIL(pthread_create(&tid, NULL, server_thread, > (void *)&server_fd))) > goto close_server_fd; > - > - pthread_mutex_lock(&server_started_mtx); > pthread_cond_wait(&server_started, &server_started_mtx); > pthread_mutex_unlock(&server_started_mtx); > > -- > 2.27.0.389.gc38d7665816-goog >