# submitters and upstream can reproduce it tags 673596 - unreproducible clone 673596 -1 retitle -1 pthread_join returns before child thread has been reaped forwarded -1 http://thread.gmane.org/gmane.comp.lib.glibc.user/1356/focus=1359 # difficult severity -1 wishlist forwarded 673596 http://thread.gmane.org/gmane.comp.lib.glibc.user/1356 block 673596 by -1 tags 673596 + patch quit
Thibaut Girka wrote[1]: > I've recently tried to build eglibc from Debian's source package, > but it aborted because of a failed test[1]: nptl/tst-eintr1. > This test failed because one of the pthread_create calls returned EAGAIN. > Indeed, after some time, instead of 12~23 threads, > “grep Threads /proc/$PID/status” reveals much more threads (up to several > thousands). Siddhesh Poyarekar explained[2]: > On a serious note, I see this all the time on my tests runs and it is > in fact a kernel issue. The problem here is that the kernel needs to > signal the joining thread before it exits and the time taken from that > point to actual freeing of task resources seems to be longer than the > time it takes for the process to spawn another thread after joining > and for that thread to return. > > A very simple way to eliminate this is to introduce a very small > usleep within the thread function. Debian eglibc maintainers: how about this patch (untested)? It modifies tst-eintr1 to try again instead of failing when hitting the race. if (e == EAGAIN) continue; An alternative would be to mark tst-eintr1 as an expected failure, but I prefer this workaround since it means other kinds of failure in the test can still be caught. It is not clear to me whether the underlying problem is a bug in pthread_join() or just a quality of implementation issue (in which case it would also be a bug in the testsuite). POSIX defines pthread_join() as waiting for the target thread's termination: The pthread_join() function shall suspend execution of the calling thread until the target thread terminates, unless the target thread has already terminated. It seems to imply that joined threads do not count towards an implementation's limits: It is unspecified whether a thread that has exited but remains unjoined counts against {PTHREAD_THREADS_MAX}. However, the (non-normative) rationale suggests that pthread_join() can be emulated through a procedure that would be subject to the same race that breaks tst-eintr1: It is true that a programmer could simulate this function if it were not provided by passing extra state as part of the argument to the start_routine(). The terminating thread would set a flag to indicate termination and broadcast a condition that is part of that state; a joining thread would wait on that condition variable. [...] Thus, while not a primitive, including pthread_join() in this volume of POSIX.1-2008 was considered valuable. Meh. Hope that helps, Jonathan [1] http://thread.gmane.org/gmane.comp.lib.glibc.user/1356 [2] http://thread.gmane.org/gmane.comp.lib.glibc.user/1356/focus=1358
Index: debian/changelog =================================================================== --- debian/changelog (revision 5284) +++ debian/changelog (working copy) @@ -11,6 +11,11 @@ * patches/hurd-i386/libpthread_nort.diff: Add patch to revert upstream librt usage. + [ Jonathan Nieder ] + * patches/any/unsubmitted-tst-eintr1-eagain.diff: new patch to work around a + race that lets pthread_create hit resource limits when the kernel takes + too long to clean up after joined threads. Closes: #673596. + -- Aurelien Jarno <aure...@debian.org> Sun, 03 Jun 2012 22:51:53 +0200 eglibc (2.13-33) unstable; urgency=medium Index: debian/patches/any/unsubmitted-tst-eintr1-eagain.diff =================================================================== --- debian/patches/any/unsubmitted-tst-eintr1-eagain.diff (revision 0) +++ debian/patches/any/unsubmitted-tst-eintr1-eagain.diff (working copy) @@ -0,0 +1,25 @@ +2012-06-06 Jonathan Nieder <jrnie...@gmail.com> + + * nptl/tst-eintr1.c (tf1): Tolerate EAGAIN from pthread_create. + +--- + +--- a/nptl/tst-eintr1.c ++++ b/nptl/tst-eintr1.c +@@ -49,6 +49,16 @@ + puts ("pthread_create returned EINTR"); + exit (1); + } ++ if (e == EAGAIN) ++ { ++ /* The kernel might not have processed the last few ++ pthread_join()s yet. Tolerate that, but record the ++ event in test output so attentive people reading ++ logs can notice if pthread_join() stops working ++ altogether. */ ++ write (STDOUT_FILENO, "!", 1); ++ continue; ++ } + + char buf[100]; + printf ("tf1: pthread_create failed: %s\n", Index: debian/patches/series =================================================================== --- debian/patches/series (revision 5284) +++ debian/patches/series (working copy) @@ -361,3 +361,4 @@ any/local-sunrpc-dos.diff any/cvs-ld.so-rpath-origin.diff any/cvs-pthread-builtin-expect.diff +any/unsubmitted-tst-eintr1-eagain.diff