Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-06 Thread Pavel Raiskup
On Thursday, January 5, 2017 7:51:00 PM CET Bruno Haible wrote: > Pavel Raiskup wrote: > > Thanks. Minor report is that gl_thread_join() is not handled properly for > > joined thread statuses. > > > > This leads to situation that Koji build system tries to gently terminate > > the build first (af

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-05 Thread Bruno Haible
Pavel Raiskup wrote: > Thanks. Minor report is that gl_thread_join() is not handled properly for > joined thread statuses. > > This leads to situation that Koji build system tries to gently terminate > the build first (after two days) ... which (sometimes?) leads to successful > 'test-lock' run i

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-05 Thread Torvald Riegel
IMO, users of reader-writer locks should treat them as a mutual-exclusion mechanism. That is, a mechanism that just ensures that two critical sections will not execute concurrently (except if both are readers, of course), so at the same time. It is also important to understand what this does not

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-05 Thread Pavel Raiskup
Thanks. Minor report is that gl_thread_join() is not handled properly for joined thread statuses. This leads to situation that Koji build system tries to gently terminate the build first (after two days) ... which (sometimes?) leads to successful 'test-lock' run in the end and the build succeeds

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-05 Thread Bruno Haible
Pavel Raiskup wrote: > I still see infinite hang on ppc64le (sometimes), as discussed in [1]. It > looks like starvation of writers in test_rwlock(). > > Could we set PTHREAD_RWLOCK_PREFER_WRITER_NP (in test-lock.c) to avoid > those issues? Here's what I'm pushing. You were right with your intu

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-04 Thread Pavel Raiskup
On Wednesday, January 4, 2017 3:17:01 PM CET Bruno Haible wrote: > Pádraig Brady: > > Now that test-lock.c is relatively fast on numa/multicore systems, > > it seems like it would be useful to first alarm(30) or something > > to protect against infinite hangs? > > If we could not pinpoint the orig

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-04 Thread Bruno Haible
Pavel Raiskup wrote: > As gnulib is portability library, it would be probably nice if gnulib > automatically set appropriate policy according to actual specifications (even > if > we had to set the policy by non-portable calls). The 2-2.c test shows that it still fails, even if the right policy i

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-04 Thread Bruno Haible
Pádraig Brady: > Now that test-lock.c is relatively fast on numa/multicore systems, > it seems like it would be useful to first alarm(30) or something > to protect against infinite hangs? If we could not pinpoint the origin of the problem, I agree, an alarm(30) would be the right thing to prevent

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-04 Thread Pavel Raiskup
On Wednesday, January 4, 2017 1:19:36 PM CET Pavel Raiskup wrote: > I don't want to claim rwlocks are not reliable. IMO rwlocks do what we > ask to do... One writer OR multiple readers. > > The question is what should be the default policy ... who should be more > privileged by default (writers

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-04 Thread Pavel Raiskup
Hi Bruno, On Wednesday, January 4, 2017 11:54:27 AM CET Bruno Haible wrote: > Hi Pavel, > > > Can we assume all systems supporting pthreads are conforming to this > > specs? That was pretty big (and pretty recent) change of specification. > > The change in the specification [4] mentions that th

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-04 Thread Bruno Haible
> and it points to two tests from the Linux test project [5][6]. Can you run > these tests on your Koji system? For me, these two tests fail on a glibc-2.23 system: $ wget https://raw.githubusercontent.com/linux-test-project/ltp/master/testcases/open_posix_testsuite/conformance/interfaces/pthrea

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-04 Thread Pádraig Brady
On 04/01/17 10:54, Bruno Haible wrote: > Hi Pavel, > >> Can we assume all systems supporting pthreads are conforming to this >> specs? That was pretty big (and pretty recent) change of specification. > > The change in the specification [4] mentions that the issue arose with glibc, > and it point

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-04 Thread Bruno Haible
Hi Pavel, > Can we assume all systems supporting pthreads are conforming to this > specs? That was pretty big (and pretty recent) change of specification. The change in the specification [4] mentions that the issue arose with glibc, and it points to two tests from the Linux test project [5][6].

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-04 Thread Pavel Raiskup
On Wednesday, January 4, 2017 12:43:17 AM CET Bruno Haible wrote: > Pavel Raiskup wrote: > > POSIX says (for pthread_rwlock_wrlock()): > > > > Implementations may favor writers over readers to avoid writer > > starvation. > > > > But that's too far from 'shall favor' spelling. > > You must

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-03 Thread Bruno Haible
Pavel Raiskup wrote: > POSIX says (for pthread_rwlock_wrlock()): > > Implementations may favor writers over readers to avoid writer starvation. > > But that's too far from 'shall favor' spelling. You must be looking at an old version of POSIX [1]. POSIX:2008 specifies [2]: [TPS] If ... the

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-03 Thread Pavel Raiskup
Hello Berny, On Monday, January 2, 2017 8:02:03 PM CET Bernhard Voelker wrote: > On 01/02/2017 05:37 PM, Pavel Raiskup wrote: > > On Monday, January 2, 2017 4:50:28 PM CET Bruno Haible wrote: > >> Especially since the problem occurs only on one architecture. > > > > I've been able to reproduce th

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-02 Thread Bernhard Voelker
On 01/02/2017 05:37 PM, Pavel Raiskup wrote: > On Monday, January 2, 2017 4:50:28 PM CET Bruno Haible wrote: >> Especially since the problem occurs only on one architecture. > > I've been able to reproduce this on i686 in the meantime too, sorry -- I just > reported what I observed :(. See [1].

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-02 Thread Pavel Raiskup
On Monday, January 2, 2017 4:50:28 PM CET Bruno Haible wrote: > Hi Pavel, > > > One thing I'm afraid of is that writers could finish too > > early. Could we could artificially slow them down? > > In test_rwlock the test does this: > > /* Wait for the threads to terminate. */ > for (i = 0;

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-02 Thread Bruno Haible
Hi Pavel, > One thing I'm afraid of is that writers could finish too > early. Could we could artificially slow them down? In test_rwlock the test does this: /* Wait for the threads to terminate. */ for (i = 0; i < THREAD_COUNT; i++) gl_thread_join (threads[i], NULL); set_atomic_int_v

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2017-01-02 Thread Pavel Raiskup
On Saturday, December 24, 2016 6:52:07 PM CET Bruno Haible wrote: > Wow, a 30x speed increase by using a lock instead of 'volatile'! > > Thanks for the testing. I cleaned up the patch to do less > code duplication and pushed it. Thanks, that's nice speedup! And sorry for the delay.. I still see

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2016-12-24 Thread Bruno Haible
> > What happens when a program reads from a 'volatile' variable > > at address xy in a multi-processor system? Looking at the assembler code produced by GCC: GCC does not emit any barrier or similar instructions for reads or writes to 'volatile' variables. So, the loop in test_lock actually waits

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2016-12-24 Thread Pádraig Brady
On 24/12/16 17:52, Bruno Haible wrote: > Hi Pádraig, > >> Wow that's much better on a 40 core system: >> >> Before your patch: >> = >> $ time ./test-lock >> Starting test_lock ... OK >> Starting test_rwlock ... OK >> Starting test_recursive_lock ... OK >> Starting test_once ... OK

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2016-12-24 Thread Bruno Haible
Hi Pádraig, > Wow that's much better on a 40 core system: > > Before your patch: > = > $ time ./test-lock > Starting test_lock ... OK > Starting test_rwlock ... OK > Starting test_recursive_lock ... OK > Starting test_once ... OK > > real1m32.547s > user1m32.455s > sys

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2016-12-23 Thread Pádraig Brady
On 22/12/16 21:58, Bruno Haible wrote: > Pádraig Brady wrote: >> There was a recent enough report on helgrind reporting issues with it: >> https://lists.gnu.org/archive/html/bug-gnulib/2015-07/msg00032.html > > I would view this as a false positive. The test uses some 'volatile' > variables to com

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2016-12-22 Thread Bruno Haible
Pádraig Brady wrote: > There was a recent enough report on helgrind reporting issues with it: > https://lists.gnu.org/archive/html/bug-gnulib/2015-07/msg00032.html I would view this as a false positive. The test uses some 'volatile' variables to communicate among threads, and 'valgrind --tool=helg

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2016-12-22 Thread Pádraig Brady
On 21/12/16 23:55, Bruno Haible wrote: > Hi Pavel, > >> But I'm rather asking whether we know about recent issues. >> >> Firstly I noticed the hang on ppc64le, but later it occurred on x86_64, ppc64 >> and i686: https://koji.fedoraproject.org/koji/taskinfo?taskID=16970779 >> so this is not arch-sp

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

2016-12-21 Thread Bruno Haible
Hi Pavel, > But I'm rather asking whether we know about recent issues. > > Firstly I noticed the hang on ppc64le, but later it occurred on x86_64, ppc64 > and i686: https://koji.fedoraproject.org/koji/taskinfo?taskID=16970779 > so this is not arch-specific issue. I can find these (old) reports:

Test-lock hang (not 100% reproducible) on GNU/Linux

2016-12-20 Thread Pavel Raiskup
Hi all, has anybody experienced issues with 'test-lock'? I haven't been able to reproduce this on my box, yet .. so I need to get and access to machines in Fedora's Koji (or get builder specs) and I'll have a look at this ASAP. But I'm rather asking whether we know about recent issues. Firstly I