How does this do for you?
On Thu, Jul 4, 2019 at 7:15 AM Matthias Klose <[email protected]> wrote:
>
> I'm running into some issues building LTO+profiled enabled configurations in
> some constrained build environment called buildds, having four cores and 16GB
> of
> RAM.
>
> configured for all frontends (maximum number of LTO links) and configured with
>
> --enable-bootstrap \
> --with-build-config=bootstrap-lto-lean \
> --enable-link-mutex
>
> and building the make profiledbootstrap-lean target.
>
> Most builds time out after 150 minutes.
>
> A typical LTO link runs for around one minute on this hardware, however a LTO
> link with -fprofile-use runs for up to three hours.
>
> So gcc/lock-and-run.sh runs the first lto-link, waits for all other 300
> seconds,
> then removes the "stale" locks, and runs everything in parallel ... Which
> surprisingly goes well, because -flto=jobserver is in effect, so I don't see
> any
> memory constraints yet.
>
> The machine then starts building all front-ends, but apparently is not
> overloaded, as -flto=jobserver is in effect. However there is no output, and
> that triggers the timeout. Richi mentioned on IRC that the LTO links only have
> buffered output (unless you run in debug mode), and that is only emitted once
> the link finishes. However even with unbuffered output, there could be times
> when nothing is happening, no warnings?
>
> I'm currently experimenting with a modified lock-and-run.sh, which basically
> sets the delay for releasing the "stale" locks to 30min instead of 5 min, runs
> the LTO link in the background and checks for the status of the background
> job,
> emitting some "running ..." messages while not finished. Still adjusting some
> parameters, but at least that succeeds on some of my configurations.
>
> The locking mechanism was introduced in 2013,
> https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00001.html
>
> lock-and-run.sh should probably modified not to release the "stale" locks
> based
> on a fixed timeout value. How?
>
> While the "no-output" problem can be fixed in the lock script as well
> (attached), this doesn't apply to third party apps. Having unbuffered output
> and/or an option to print progress would be beneficial.
>
> Matthias
>
>
>
commit c570f3f4751385153292c85c2f38dc78e9443923
Author: Jason Merrill <[email protected]>
Date: Sat Sep 14 14:02:20 2019 -0400
* lock-and-run.sh: Check for process existence rather than timeout.
diff --git a/gcc/lock-and-run.sh b/gcc/lock-and-run.sh
index 3a6a84c253a..b1a4a4c8220 100644
--- a/gcc/lock-and-run.sh
+++ b/gcc/lock-and-run.sh
@@ -5,29 +5,28 @@ lockdir="$1" prog="$2"; shift 2 || exit 1
# Remember when we started trying to acquire the lock.
count=0
-touch lock-stamp.$$
-trap 'rm -r "$lockdir" lock-stamp.$$' 0
+trap 'rm -rf "$lockdir"' 0
until mkdir "$lockdir" 2>/dev/null; do
# Say something periodically so the user knows what's up.
if [ `expr $count % 30` = 0 ]; then
- # Reset if the lock has been renewed.
- if [ -n "`find \"$lockdir\" -newer lock-stamp.$$`" ]; then
- touch lock-stamp.$$
- count=1
- # Steal the lock after 5 minutes.
- elif [ $count = 300 ]; then
- echo removing stale $lockdir >&2
- rm -r "$lockdir"
+ # Check for stale lock.
+ pid="`(cd $lockdir; echo *)`"
+ if ps "$pid" >/dev/null; then
+ echo waiting $count sec to acquire $lockdir from PID $pid>&2
+ found=$pid
else
- echo waiting to acquire $lockdir >&2
+ echo PID $pid is dead, removing stale $lockdir >&2
+ rm -r "$lockdir"
fi
fi
sleep 1
count=`expr $count + 1`
done
+touch $lockdir/$$
+echo acquired $lockdir after $count seconds >&2
echo $prog "$@"
$prog "$@"