Pádraig Brady: > Now that test-lock.c is relatively fast on numa/multicore systems, > it seems like it would be useful to first alarm(30) or something > to protect against infinite hangs?
If we could not pinpoint the origin of the problem, I agree, an alarm(30) would be the right thing to prevent an infinite hang. But by now, we know 1) It's a glibc bug: The test [6] fails even after it has set the policies that POSIX expects for the "writers get the rwlock in preference to readers guarantee". 2) Without this guarantee, a reader function that repeatedly spends I milliseconds in a section protected by the rwlock, O milliseconds without the rwlock being held, in a system with N reader threads in parallel will lead to - a successful termination if N * I / (I + O) < 1.0 - an infinite hang if N * I / (I + O) > 1.0 (There is actually no discontinuity at 1.0; need to use probability calculus for a more detailed analysis.) So, in order to make test_rwlock hang-tree, I would need to introduce a sleep() without the rwlock being held, and the duration of this sleep would be at least (N - 1) * I. Now, asking an application writer to add sleep()s in his code, with a duration that depends both on the number of threads and on the time spent in specific portions of the code, is outrageous. So, as it stands, POSIX rwlock without a "writers get preference" guarantee is unusable. I propose to do what we usually do in gnulib, to work around bugs and unusable APIs: - Write a configure test for the guarantee, based on [6]. - Modify the 'lock' module to use its own implementation of rwlock. - Add a unit test to verify the guarantee (so that we can also detect if the same problem occurs in pth or Solaris), again based on [6]. Patch in preparation... Bruno [6] https://github.com/linux-test-project/ltp/blob/master/testcases/open_posix_testsuite/conformance/interfaces/pthread_rwlock_rdlock/2-2.c
diff --git a/lib/glthread/lock.c b/lib/glthread/lock.c index 6760bbd..215991e 100644 --- a/lib/glthread/lock.c +++ b/lib/glthread/lock.c @@ -30,7 +30,7 @@ /* ------------------------- gl_rwlock_t datatype ------------------------- */ -# if HAVE_PTHREAD_RWLOCK +# if HAVE_PTHREAD_RWLOCK && !(__GNU_LIBRARY__ > 1) # if !defined PTHREAD_RWLOCK_INITIALIZER @@ -152,11 +152,9 @@ glthread_rwlock_rdlock_multithreaded (gl_rwlock_t *lock) if (err != 0) return err; /* Test whether only readers are currently running, and whether the runcount - field will not overflow. */ - /* POSIX says: "It is implementation-defined whether the calling thread - acquires the lock when a writer does not hold the lock and there are - writers blocked on the lock." Let's say, no: give the writers a higher - priority. */ + field will not overflow, and whether no writer is waiting. The latter + condition is because POSIX recommends that "write locks shall take + precedence over read locks", to avoid "writer starvation". */ while (!(lock->runcount + 1 > 0 && lock->waiting_writers_count == 0)) { /* This thread has to wait for a while. Enqueue it among the @@ -796,8 +794,10 @@ glthread_rwlock_rdlock_func (gl_rwlock_t *lock) } EnterCriticalSection (&lock->lock); /* Test whether only readers are currently running, and whether the runcount - field will not overflow. */ - if (!(lock->runcount + 1 > 0)) + field will not overflow, and whether no writer is waiting. The latter + condition is because POSIX recommends that "write locks shall take + precedence over read locks", to avoid "writer starvation". */ + if (!(lock->runcount + 1 > 0 && lock->waiting_writers.count == 0)) { /* This thread has to wait for a while. Enqueue it among the waiting_readers. */ diff --git a/lib/glthread/lock.h b/lib/glthread/lock.h index 3ea98d6..c808d0e 100644 --- a/lib/glthread/lock.h +++ b/lib/glthread/lock.h @@ -176,7 +176,7 @@ typedef pthread_mutex_t gl_lock_t; /* ------------------------- gl_rwlock_t datatype ------------------------- */ -# if HAVE_PTHREAD_RWLOCK +# if HAVE_PTHREAD_RWLOCK && !(__GNU_LIBRARY__ > 1) # ifdef PTHREAD_RWLOCK_INITIALIZER