Re: RFA [PATCH] * lock-and-run.sh: Check for process existence rather than timeout.

Alexandre Oliva Fri, 18 Oct 2019 21:09:20 -0700

Hello, Jason,

On Oct 14, 2019, Jason Merrill <ja...@redhat.com> wrote:


> Alex, you had a lot of helpful comments when I first wrote this, any thoughts
> on this revision?

I think the check of the pid file could be made slightly simpler and
cheaper if we created it using:

   echo $$ > $lockdir/pidT && mv $lockdir/pidT $lockdir/pid

instead of

> +touch $lockdir/$$



> +     pid="`(cd $lockdir; echo *)`"

The ""s are implicit in a shell assignment, though there really
shouldn't be more than one PID-named file in the dir.  With the change
suggested above, this would become

        pid=`cat $lockdir/pid 2>/dev/null`

There's a slight possibility of hitting this right between the creation
of the dir and the creation of the pid file, thus the 2>/dev/null.

> +     if ps "$pid" >/dev/null; then

could be tested with much lower overhead:

        if test -z "$pid" || kill -0 $pid ; then

though it might make sense to have a different test and error message
for the case of the absent pid file.

We might also wish to use different lock-breaking logic for that case,
too, e.g. checking that the timestamp of the dir didn't change by
comparing `ls -ld $lockdir` with what we got 30 seconds before.  If it
changed or the output is now empty, we just lost the race again.

It's unlikely that the dir would remain unchanged for several seconds
without the pid file, so if we get the same timestamp after 30 seconds,
it's likely that something went wrong with the lock holder, though it's
not impossible to imagine a scenario in which the lock program that just
succeeded in creating the dir got stopped (^Z) or killed-9 just before
creating the PID file.


Even then, maybe breaking the lock is not such a great idea in
general...

Though mkdir is an operation that forces a synchronization, reading a
file without a filesystem lock isn't.  The rename alleviates that a bit,
but it's not entirely unreasonable for an attempt to read the file to
cache the absence of the file and not notice a creation shortly
afterward.  This would be a lot more of an issue in case of contention
for the lock between different clients of a shared networked filesystem,
though we might imagine highly parallel systems to eventually hit such
issues as well.

But just the possibility of contention across a shared network
filesystem would give me pause, out of realizing that checking for a
local process with the same PID would do no good.  And then, kill -0
would not succeed if the lock holder was started by a different user,
unlike ps.

What if we printed an error message suggesting the command to clean up,
and then errored out, instead of retrying forever or breaking the lock
and proceeding?  Several programs that rely on lock files (git and svn
come to mind) seem to be taking such an approach these days, presumably
because of all the difficulties in automating the double-checking in all
potential scenarios.

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!        FSF VP & FSF Latin America board member
GNU Toolchain Engineer                        Free Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara

Re: RFA [PATCH] * lock-and-run.sh: Check for process existence rather than timeout.

Reply via email to