Hi,
I have done some testing, it seems two tests fail the most often:
tcp_recv_two_quota and tcp_noresponse
PID 32090 exceeded run time limit, sending SIGKILL
Would you know, why just those tests so often timeouts?
But I have found also strange issues when trying to find a way to
reproduce on my local machine.
When repeating make -j8 check in tests/isc build directory, the test
often fails with just exit status 255 and no more details. Like
netmgr_test.log contains:
[ RUN ] tcp_half_recv_half_send_sendback
[ OK ] tcp_half_recv_half_send_sendback
[ RUN ] tcp_recv_one_quota
[ OK ] tcp_recv_one_quota
[ RUN ] tcp_recv_two_quota
[ OK ] tcp_recv_two_quota
[ RUN ] tcp_recv_send_quota
[ OK ] tcp_recv_send_quota
[ RUN ] tcp_recv_half_send_quota
[ OK ] tcp_recv_half_send_quota
[ RUN ] tcp_half_recv_send_quota
FAIL netmgr_test (exit status: 255)
What might be cause of this kind of termination? Since it does not
happen separately, I cannot step this with gdb. I think it happens just
when running under multiple make processes, in my case make -j8 (I have
4 cores with hyperthreading).
It does happen about 20% cases of running, do not have exact numbers.
Do such issues happen also on bind's infrastructure on gitlab?
Regards,
Petr
On 8/29/22 22:57, PGNet Dev wrote:
I'm building bind9 (v9.18.5, atm) on Fedora's COPR infrastructure.
Building for Fedora 36, 37 & Rawhide, the builds FAIL
randomly/intermittently here
For example, with no changes to any source/spec, simply triggering
rebuilds, over a period of just a few hours,
Time F36 F37 Rawhide build URL
-------------------- ---- ---- ------- ----------
2022-08-29 15:58 EDT OK FAIL OK
https://copr.fedorainfracloud.org/coprs/pgfed/bind/build/4784469/
2022-08-29 14:23 EDT FAIL OK OK
https://copr.fedorainfracloud.org/coprs/pgfed/bind/build/4784210/
2022-08-29 11:49 EDT OK OK OK
https://copr.fedorainfracloud.org/coprs/pgfed/bind/build/4776394/
I'm trying to get a handle on cause ...
Local builds on my own infrastructure are always successful; the
issue's only on COPR.
The FAILs are always in `netmgr_test` unittests ...
looking at netmgr test source, my as-yet-unfounded suspicion is that
these timeouts
https://github.com/isc-projects/bind9/blob/v9_18_5/tests/isc/netmgr_test.c#L116
are intermittently hitting limits -- only in COPR/online. perhaps for
specific transport?
I also note that -- in main, upstream, 3 days ago -- netmgr tests are
being split up, into separate per-transport tests,
https://github.com/isc-projects/bind9/commit/37a1be5acc32244cec03cedc1bd46bc4aa0fbc18
I'm not clear what specific problem is being solved by that split, but
imagine that it might well have an effect on builds @ COPR.
I've not been able to get detailed test FAIL logs from COPR builds
(local builds do not FAIL). currently, @ #fedora-buildsys, did manage
to get a reproducer of the build FAIL; I'm hoping I might get access
to those FAIL logs via a manual COPR build.
Anyone here seen similar issues with netmgr, or maybe have a clue?
Fwiw, I've initially filed at RH BZ already:
https://bugzilla.redhat.com/show_bug.cgi?id=2122010
; no response there yet.
--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from
this list
ISC funds the development of this software with paid support subscriptions.
Contact us at https://www.isc.org/contact/ for more information.
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users