Re: GNU Make 4.2 Query

David Boyce Mon, 02 Sep 2019 10:41:32 -0700

I'm not going to address the remote execution topic since it sounds like
you already have the solution and are not looking for help. However, I do
have fairly extensive experience with the NFS/retry area so will try to
contribute there.

First, I don't think what Paul says:

> As for your NFS issue, another option would be to enable the .ONESHELL
> feature available in newer versions of GNU make: that will ensure that
> all lines in a recipe are invoked in a single shell, which means that
> they should all be invoked on the same remote host.

Is sufficient. Consider the typical case of compiling foo.c to foo.o and
linking it into foo.exe. Typically, and correctly, those actions would be
in two separate recipes which in a distributed-build scenario could run on
different hosts so the linker may not find the .o file from a previous
recipe. Here .ONSHELL cannot help since they're different recipes.

In my day job we use a product from IBM called LSF (Load Sharing
F-something,
https://www.ibm.com/support/knowledgecenter/en/SSETD4_9.1.3/lsf_welcome.html)
which exists to distribute jobs over a server farm (typically using NFS)
according to various metrics like load and free memory and so on. Part of
the LSF package is a program called lsmake (
https://www.ibm.com/support/knowledgecenter/en/SSETD4_9.1.3/lsf_command_ref/lsmake.1.html)
which under the covers is a version of GNU make with enhancement to enable
remote/distributed recipes and also adds retry-with-delay feature Nikhil
requested). Since GNU make is GPL, IBM is required to make its package of
enhancements available under GPL as well. Much of it is not of direct
interest to the open source community because it's all about communicating
with IBM's proprietary daemons but their retry logic could probably be
taken directly from the patch. At the very least, if retries were to be
added to GNU make per se it would be nice if the flags were compatible with
lsmake.

However, my personal belief is that retries are God's way of telling us to
think harder and better. Retrying (and worse, delay-and-retry) is a form of
defeatism which I call "sleep and hope". Computers are deterministic,
there's always a root cause which can usually be found and addressed with
sufficient analysis, etc. Granted there are cases where you understand the
problem but can't address it for administrative/permissions/business
reasons but that can't be known until the problem is understood.

NFS caching is the root cause of unreliable distributed builds, as you've
already described, but most or all of these issues can be addressed with a
less blunt instrument than sleep-and-retry. Even LSF engineers threw up
their hands and did retries but what we did here was take their patch,
which at last check was still targeted to 3.81, and while porting it to 4.1
added some of the cache-flushing strategies detailed below. This has solved
most if not all of our NFS sync problems. Caveat: most of our people still
use the LSF retry logic in addition, because they're not as absolutist as I
am and just want to get their jobs done (go figure), which makes it harder
to determine what percentage of problems are solved by cache flushing vs
retries but I'm pretty sure flushing has resolved the great majority of
problems.

One problem with sleep-and-hope is that there's no amount of time
guaranteed to be enough so you're just driving the incidence rate down, not
fixing it.

Since we were already working with a hacked version of GNU make we found it
most convenient to implement flushing directly in the make program but it
can also be done within recipes. In fact we have 3 different
implementations of the same NFS cache flushing logic:

1. Directly within our enhanced version of lsmake.
2. In a standalone binary called "nfsflush".
3. In a Python script called nfsflush.py.

The Python script is a lab for trying out new strategies but it's too slow
for production use. The binary is a faster version of the same techniques
for direct use in recipes, and that same C code is linked directly into
lsmake as well. Here's the usage message of our Python script:

*$ nfsflush.py --helpusage: nfsflush.py [-h] [-f] [-l] [-r] [-t] [-u] [-V]
path [path ...]positional arguments:  path             directory paths to
flushoptional arguments:  -h, --help       show this help message and exit
-f, --fsync      Use fsync() on named files  -l, --lock       Lock and
unlock the named files  -r, --recursive  flush recursively  -t, --touch
 additional flush action - touch and remove a temp file  -u, --upflush
 flush parent directories back to the root  -V, --verbose    increment
verbosity levelFlush the NFS filehandle caches of NFS directories.Newly
created files are sometimes unavailable via NFS for a periodof time due to
filehandle caching, leading to apparent race problems.See
http://tss.iki.fi/nfs-coding-howto.html#fhcache
<http://tss.iki.fi/nfs-coding-howto.html#fhcache> for details.This script
forces a flush using techniques mentioned in the URL. Itcan optionally do
so recursively.This always does an opendir/closedir sequence on each
directoryvisited, as described in the URL, because that's cheap and safe
andoften sufficient. Other strategies, such as creating and removing atemp
file, are optional.EXAMPLES:    nfsflush.py /nfs/path/...*

The most important thing is to read the URL given above and/or to google
for similar resource of which there are many. While I'm not an NFS guru
myself, the summary of my understanding is that NFS caches all sorts of
things (metadata like atime/mtime, directory updates, etc) with various
degrees of aggression according to NFS vendor and internal configuration.
We've seen substantial variation between NAS providers such as NetApp, EMC,
etc, so much depends on whose NFS server you're using. However, the NFS
spec _requires_ that caches be flushed on a write operation so all
implementations will do this.

Bottom line, the most common failure case is as mentioned above: foo.o is
compiled on host A and immediately linked on host B. The close() system
call following the final write() of foo.o on host A will cause its data to
be flushed. Similarly I *believe* the directory write (assuming foo.o is
newly created and not just updated) will cause the filehandle cache to be
flushed. Thus, after these two write ops (directory and file) the NFS
server will know about the new foo.o as soon as it's created.

The problem typically arises on host B because no write operation has taken
place there after foo.o was created on A so no one has told it to update
its caches and as a result it doesn't know foo.o exists and the link fails
with ENOENT. All the flushing techniques in the script above are attempts
to address this. One takeaway from all this is that even if you do retries,
a "dumb" retry is immeasurably enhanced by adding a flush. In other words
the most efficient retry formula in a distributed build scenario would be:

<recipe> || flush || <recipe>

This never flushes a cache unless the first attempt fails. It presumes that
NFS implementors and admins know what they're doing and thus caching helps
with performance so it's not done unless needed. This is what we built into
our variant of lsmake. However, the same can also be done in the shell.

Details about implemented cache flushing techniques: the filehandle cache
is the biggest source of problems in distributed builds and the simplest
solution for it seems to be opening and reading the directory entry. Thus
our script and its parallel C implementation always do that. We've also
seen cases where forcing a directory write operation is required which the
-t, --touch option does. Sometimes you can't easily enumerate all
directories involved (vpath etc) so the recurse-downward (-r) and recurse
upward (-u) flags may be helpful though they (especially -u) may also be
overkill. The -f and -l options were added based on advice found on the net
but have not been shown to be helpful in our environment.

Some techniques may be of limited utility because they require write and/or
ownership privileges. For instance I've seen statements that umounts, even
failed umounts, will force flushes. Thus a command like "cd <dir> && umount
$(pwd)" would have to fail since the moount is busy but would flush as a
side effect. However I believe this requires root privileges so is not
helpful in the normal case.

In summary: although I don't believe in retries, if they're going to be
used I think they should be implemented in a shell wrapper program which
could be passed to make as SHELL=<wrapper> and the wrapper should use
flushing in addition to, or instead of, retries. We didn't do it that way
but I think our nfsflush program could just as well have been implemented
as say "nfsshell" such that "nfsshell [other-options] -c <recipe>" would
run the recipe along with added flushing and retrying options. I agree with
Paul that I see no reason to implement any of these features, retry and/or
flush, directly in make.

David

On Mon, Sep 2, 2019 at 6:05 AM Paul Smith <psm...@gnu.org> wrote:

> On Sun, 2019-09-01 at 23:23 -0700, Kaz Kylheku (gmake) wrote:
> > If your R&D team would allow you to add just one line to the
> > legacy GNU Makefile to assign the SHELL variable, you can assign that
> > to a shell wrapper program which performs command re-trying.
>
> You don't have to add any lines to the makefile.  You can reset SHELL
> on the command line, just like any other make variable:
>
>     make SHELL=/my/special/sh
>
> You can even override it only for specific targets using the --eval
> command line option:
>
>     make --eval 'somerule: SHELL := /my/special/sh'
>
> Or, you can add '-f mymakefile.mk -f Makefile' options to the command
> line to force reading of a personal makefile before the standard
> makefile.
>
> Clearly you can modify the command line, otherwise adding new options
> to control a putative retry on error option would not be possible.
>
> As for your NFS issue, another option would be to enable the .ONESHELL
> feature available in newer versions of GNU make: that will ensure that
> all lines in a recipe are invoked in a single shell, which means that
> they should all be invoked on the same remote host.  This can also be
> done from the command line, as above.  If your recipes are written well
> it should Just Work.  If they aren't, and you can't fix them, then
> obviously this solution won't work for you.
>
> Regarding changes to set re-invocation on failure, at this time I don't
> believe it's something I'd be willing to add to GNU make directly,
> especially not an option that simply retries every failed job.  This is
> almost never useful (why would you want to retry a compile, or link, or
> similar?  It will always just fail again, take longer, and generate
> confusing duplicate output--at best).
>
> The right answer for this problem is to modify the makefile to properly
> retry those specific rules which need it.
>
> I commiserate with you that your environment is static and you're not
> permitted to modify it, however adding new specialized capabilities to
> GNU make so that makefiles don't have to be modified isn't a design
> philosophy I want to adopt.
>
>
> _______________________________________________
> Help-make mailing list
> Help-make@gnu.org
> https://lists.gnu.org/mailman/listinfo/help-make
>
_______________________________________________
Help-make mailing list
Help-make@gnu.org
https://lists.gnu.org/mailman/listinfo/help-make

Re: GNU Make 4.2 Query

Reply via email to