Re: make-3.82 testcases fail sometimes

Matthias Hopf Tue, 31 Aug 2010 07:14:52 -0700

On Aug 31, 10 09:54:50 -0400, Paul Smith wrote:
> On Mon, 2010-08-30 at 19:52 +0200, Matthias Hopf wrote:
> > All except those in targets/SECONDARY (which I do not 100% understand
> > yet) are related to tests using sleep for parallelization tests -
> > something highly unreliable on systems with lots of processors and
> > high load.
> 
> Lots of processors shouldn't make a difference; if you "sleep 2" it'll
> wait for (at least) 2 seconds, no matter whether you have one or 4,096
> processors.


You typically don't put a high load on machines with few processors -
stalling will become a real issue. On machines with many processors a
high load will typically stall processes only temporary. That's why I
added this to the description.

> If your system is under very heavy load then I guess it could matter
> although having a "sleep 2" take 4 or more seconds before you wake up,
> for a simple command-line tool like make, seems like you would have to
> have a REALLY REALLY high load.

I think that is exactly the scenario I explained. Our build environment
uses (lots of) virtual machines to separate different build
environments, and are loaded quite high.

In effect, the tests of make 3.81 failed on our build systems every now
and then. For 3.82 this is worse, I was able to sometimes fail one of
the tests even on my local workstation with 8 cores and not too much
stuff running otherwise. It was one of the targets/SECONDARY, though,
without the use of any sleep.

With a sleep time factor of 4 I haven't seen a single test outside
targets/SECONDARY fail so far. A factor of 2 was not enough (I used that
first, and it's still used for compiling make for SLE9, and that fails
every now and then).

> Using configurable factors is not something I want to get into.  If it's
> really the case that if you have two invocations of "sleep", one "sleep
> 2" and one "sleep 4", running at the same time, and you can't guarantee
> that the "sleep 2" will finish before the "sleep 4", then we'll need to
> find a completely different method of testing parallel builds.

I already wrote that this issue exists when make version 3.81 was
released, and the discussion wasn't exactly long or productive.

> Maybe we could do something like use locking files (one process sleeps
> for a second and creates file X, another process waits for file X to
> exist then sleeps for a second and continues).  The problem here is
> doing it in a portable way, so it works on both UNIX and Windows (for
> example) systems is not so simple.

As the files don't have to be locked, but rather only created and tested
for existence (if using single use names), I think this would be
possible to do. Let me think about it.

Also the tests would fail if and mostly only if a deadlock occurs.
Detecting deadlocks in a portable way without relying on timeouts is
complicated, and with timeouts we're only delaying the main issue again.

All in all this would be a worthwhile goal, but nontrivial.

Thanks

Matthias

P.S. If you had any thoughts about targets/SECONDARY, they would be
highly appreciated.

-- 
Matthias Hopf <mh...@suse.de>      __        __   __
Maxfeldstr. 5 / 90409 Nuernberg   (_   | |  (_   |__          m...@mshopf.de
Phone +49-911-74053-715           __)  |_|  __)  |__  R & D   www.mshopf.de

_______________________________________________
Bug-make mailing list
Bug-make@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-make

Re: make-3.82 testcases fail *sometimes*

Reply via email to

Re: make-3.82 testcases fail sometimes