On Wednesday, January 4, 2017 3:17:01 PM CET Bruno Haible wrote: > Pádraig Brady: > > Now that test-lock.c is relatively fast on numa/multicore systems, > > it seems like it would be useful to first alarm(30) or something > > to protect against infinite hangs? > > If we could not pinpoint the origin of the problem, I agree, an alarm(30) > would be the right thing to prevent an infinite hang. > > But by now, we know > > 1) It's a glibc bug: The test [6] fails even after it has set the > policies that POSIX expects for the "writers get the rwlock in preference > to readers guarantee". > > 2) Without this guarantee, a reader function that repeatedly spends > I milliseconds in a section protected by the rwlock, > O milliseconds without the rwlock being held, > in a system with N reader threads in parallel > will lead to > - a successful termination if N * I / (I + O) < 1.0 > - an infinite hang if N * I / (I + O) > 1.0 > (There is actually no discontinuity at 1.0; need to use probability > calculus for a more detailed analysis.) > So, in order to make test_rwlock hang-tree, I would need to introduce > a sleep() without the rwlock being held, and the duration of this sleep > would be at least (N - 1) * I. > > Now, asking an application writer to add sleep()s in his code, with > a duration that depends both on the number of threads and on the time > spent in specific portions of the code, is outrageous. > > So, as it stands, POSIX rwlock without a "writers get preference" guarantee > is unusable.
If we don't played with probability a bit longer, I'm still afraid this moves the problem somewhere else ... because if writers had preference, and those were able to held rwlock all the time, readers would starve. I agree that gl_pthread_rwlock* should match the specification, but at least in the actual algorithm in test_rwlock() we should make sure that some readers are actually doing something _during_ writers' typhoon... (this is not hang of test_rwlock() anymore, but certainly we want to test something..). Pavel > I propose to do what we usually do in gnulib, to work around bugs and unusable > APIs: > - Write a configure test for the guarantee, based on [6]. > - Modify the 'lock' module to use its own implementation of rwlock. > - Add a unit test to verify the guarantee (so that we can also detect > if the same problem occurs in pth or Solaris), again based on [6]. > > Patch in preparation... > > Bruno > > [6] > https://github.com/linux-test-project/ltp/blob/master/testcases/open_posix_testsuite/conformance/interfaces/pthread_rwlock_rdlock/2-2.c