How does this do for you? On Thu, Jul 4, 2019 at 7:15 AM Matthias Klose <d...@ubuntu.com> wrote: > > I'm running into some issues building LTO+profiled enabled configurations in > some constrained build environment called buildds, having four cores and 16GB > of > RAM. > > configured for all frontends (maximum number of LTO links) and configured with > > --enable-bootstrap \ > --with-build-config=bootstrap-lto-lean \ > --enable-link-mutex > > and building the make profiledbootstrap-lean target. > > Most builds time out after 150 minutes. > > A typical LTO link runs for around one minute on this hardware, however a LTO > link with -fprofile-use runs for up to three hours. > > So gcc/lock-and-run.sh runs the first lto-link, waits for all other 300 > seconds, > then removes the "stale" locks, and runs everything in parallel ... Which > surprisingly goes well, because -flto=jobserver is in effect, so I don't see > any > memory constraints yet. > > The machine then starts building all front-ends, but apparently is not > overloaded, as -flto=jobserver is in effect. However there is no output, and > that triggers the timeout. Richi mentioned on IRC that the LTO links only have > buffered output (unless you run in debug mode), and that is only emitted once > the link finishes. However even with unbuffered output, there could be times > when nothing is happening, no warnings? > > I'm currently experimenting with a modified lock-and-run.sh, which basically > sets the delay for releasing the "stale" locks to 30min instead of 5 min, runs > the LTO link in the background and checks for the status of the background > job, > emitting some "running ..." messages while not finished. Still adjusting some > parameters, but at least that succeeds on some of my configurations. > > The locking mechanism was introduced in 2013, > https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00001.html > > lock-and-run.sh should probably modified not to release the "stale" locks > based > on a fixed timeout value. How? > > While the "no-output" problem can be fixed in the lock script as well > (attached), this doesn't apply to third party apps. Having unbuffered output > and/or an option to print progress would be beneficial. > > Matthias > > >
commit c570f3f4751385153292c85c2f38dc78e9443923 Author: Jason Merrill <ja...@redhat.com> Date: Sat Sep 14 14:02:20 2019 -0400
* lock-and-run.sh: Check for process existence rather than timeout. diff --git a/gcc/lock-and-run.sh b/gcc/lock-and-run.sh index 3a6a84c253a..b1a4a4c8220 100644 --- a/gcc/lock-and-run.sh +++ b/gcc/lock-and-run.sh @@ -5,29 +5,28 @@ lockdir="$1" prog="$2"; shift 2 || exit 1 # Remember when we started trying to acquire the lock. count=0 -touch lock-stamp.$$ -trap 'rm -r "$lockdir" lock-stamp.$$' 0 +trap 'rm -rf "$lockdir"' 0 until mkdir "$lockdir" 2>/dev/null; do # Say something periodically so the user knows what's up. if [ `expr $count % 30` = 0 ]; then - # Reset if the lock has been renewed. - if [ -n "`find \"$lockdir\" -newer lock-stamp.$$`" ]; then - touch lock-stamp.$$ - count=1 - # Steal the lock after 5 minutes. - elif [ $count = 300 ]; then - echo removing stale $lockdir >&2 - rm -r "$lockdir" + # Check for stale lock. + pid="`(cd $lockdir; echo *)`" + if ps "$pid" >/dev/null; then + echo waiting $count sec to acquire $lockdir from PID $pid>&2 + found=$pid else - echo waiting to acquire $lockdir >&2 + echo PID $pid is dead, removing stale $lockdir >&2 + rm -r "$lockdir" fi fi sleep 1 count=`expr $count + 1` done +touch $lockdir/$$ +echo acquired $lockdir after $count seconds >&2 echo $prog "$@" $prog "$@"