How does this do for you?

On Thu, Jul 4, 2019 at 7:15 AM Matthias Klose <d...@ubuntu.com> wrote:
>
> I'm running into some issues building LTO+profiled enabled configurations in
> some constrained build environment called buildds, having four cores and 16GB 
> of
> RAM.
>
> configured for all frontends (maximum number of LTO links) and configured with
>
>   --enable-bootstrap \
>   --with-build-config=bootstrap-lto-lean \
>   --enable-link-mutex
>
> and building the make profiledbootstrap-lean target.
>
> Most builds time out after 150 minutes.
>
> A typical LTO link runs for around one minute on this hardware, however a LTO
> link with -fprofile-use runs for up to three hours.
>
> So gcc/lock-and-run.sh runs the first lto-link, waits for all other 300 
> seconds,
> then removes the "stale" locks, and runs everything in parallel ...  Which
> surprisingly goes well, because -flto=jobserver is in effect, so I don't see 
> any
> memory constraints yet.
>
> The machine then starts building all front-ends, but apparently is not
> overloaded, as -flto=jobserver is in effect.  However there is no output, and
> that triggers the timeout. Richi mentioned on IRC that the LTO links only have
> buffered output (unless you run in debug mode), and that is only emitted once
> the link finishes.  However even with unbuffered output, there could be times
> when nothing is happening, no warnings?
>
> I'm currently experimenting with a modified lock-and-run.sh, which basically
> sets the delay for releasing the "stale" locks to 30min instead of 5 min, runs
> the LTO link in the background and checks for the status of the background 
> job,
> emitting some "running ..." messages while not finished.  Still adjusting some
> parameters, but at least that succeeds on some of my configurations.
>
> The locking mechanism was introduced in 2013,
> https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00001.html
>
> lock-and-run.sh should probably modified not to release the "stale" locks 
> based
> on a fixed timeout value. How?
>
> While the "no-output" problem can be fixed in the lock script as well
> (attached), this doesn't apply to third party apps.  Having unbuffered output
> and/or an option to print progress would be beneficial.
>
> Matthias
>
>
>
commit c570f3f4751385153292c85c2f38dc78e9443923
Author: Jason Merrill <ja...@redhat.com>
Date:   Sat Sep 14 14:02:20 2019 -0400

            * lock-and-run.sh: Check for process existence rather than timeout.

diff --git a/gcc/lock-and-run.sh b/gcc/lock-and-run.sh
index 3a6a84c253a..b1a4a4c8220 100644
--- a/gcc/lock-and-run.sh
+++ b/gcc/lock-and-run.sh
@@ -5,29 +5,28 @@ lockdir="$1" prog="$2"; shift 2 || exit 1
 
 # Remember when we started trying to acquire the lock.
 count=0
-touch lock-stamp.$$
 
-trap 'rm -r "$lockdir" lock-stamp.$$' 0
+trap 'rm -rf "$lockdir"' 0
 
 until mkdir "$lockdir" 2>/dev/null; do
     # Say something periodically so the user knows what's up.
     if [ `expr $count % 30` = 0 ]; then
-	# Reset if the lock has been renewed.
-	if [ -n "`find \"$lockdir\" -newer lock-stamp.$$`" ]; then
-	    touch lock-stamp.$$
-	    count=1
-	# Steal the lock after 5 minutes.
-	elif [ $count = 300 ]; then
-	    echo removing stale $lockdir >&2
-	    rm -r "$lockdir"
+	# Check for stale lock.
+	pid="`(cd $lockdir; echo *)`"
+	if ps "$pid" >/dev/null; then
+	    echo waiting $count sec to acquire $lockdir from PID $pid>&2
+	    found=$pid
 	else
-	    echo waiting to acquire $lockdir >&2
+	    echo PID $pid is dead, removing stale $lockdir >&2
+	    rm -r "$lockdir"
 	fi
     fi
     sleep 1
     count=`expr $count + 1`
 done
 
+touch $lockdir/$$
+echo acquired $lockdir after $count seconds >&2
 echo $prog "$@"
 $prog "$@"
 

Reply via email to