selftests/bpf test_sockmap failure

Yonghong Song Tue, 24 Jul 2018 08:47:05 -0700

In one of our production machines, tools/testing/selftests/bpf
test_sockmap failed randomly like below:

...

[TEST 78]: (512, 1, 1, sendmsg, pass,apply 1,): rx thread exited witherr 1. FAILED

...

...

[TEST 80]: (2, 1024, 256, sendmsg, pass,apply 1,): rx thread exited witherr 1. FAILED

...

...

[TEST 83]: (100, 1, 5, sendpage, pass,apply 1,): rx thread exited witherr 1. FAILED

...

...

[TEST 79]: (512, 1, 1, sendpage, pass,apply 1,): rx thread exited witherr 1. FAILED

...

The command line is just `test_sockmap`. The machine has 80 cpus, 256Gmemory. The kernel is based on 4.16 but backported with latest bpf-nextbpf changes.


The failed test number (78, 79, 80, or 83) is random. But they all share
similar characteristics:
   . the option rate is greater than one, i.e., more than one
     sendmsg/sendpage in the sender forked process.
   . The txmsg_apply is not 0

I debugged a little bit. It happens in msg_loop() function below
"unexpected timeout" path.

...

slct = select(max_fd + 1, &w, NULL, NULL,&timeout);

                        if (slct == -1) {
                                perror("select()");
                                clock_gettime(CLOCK_MONOTONIC, &s->end);
                                goto out_errno;
                        } else if (!slct) {
                                if (opt->verbose)

fprintf(stderr, "unexpectedtimeout\n");

                                errno = -EIO;
                                clock_gettime(CLOCK_MONOTONIC, &s->end);
                                goto out_errno;
                        }
...

It appears that when the error happens, the receive process does notreceive all bytes sent from the send process and eventually times out.


Has anybody seen this issue as well?
John, any comments on this failure?

Thanks,

Yonghong

selftests/bpf test_sockmap failure

Reply via email to