Re: wait -n misses signaled subprocess

Steven Pelley Tue, 30 Jan 2024 06:12:21 -0800

> > It does look in the table of saved exit statuses, returning 1.
>
> It doesn't. In this case, the code path it follows marks the job as dead
> but doesn't mark it as notified (since it exited normally), so it's still
> in the jobs list when `wait -n' is called, and available for returning.
> That's probably a bug there.


Got it.  So wait -n is intended to behave just as the documentation
says -- "next" job -- and if there's a bug it's with how
normally-exiting processes are handled, not signal-exiting processes.
Thank you for your patience.

> > There's also an interaction in that "wait" will only look at the
> > terminated table if "-n" is not specified *and* ids are specified.
>
> This is to maintain POSIX semantics, with extensions. This is one of the
> issues -- should `wait -n' with arguments look for terminated processes
> in that table, the way `wait' without options does?

Yes, I do want wait -n to look in the terminated table, at least for
my use case responding to jobs finishing, one at a time, as soon as
possible.  I don't think wait -n can reliably do this since there is
always a race between a job finishing/being handled, the next job
finishing, and the subsequent call to wait -n.  Even if I query "jobs"
to see if multiple jobs have terminated, the next finishing job could
still race.  You've pointed out clearly that my mental model of wait
-n was wrong so please bear with me if I still don't have this right.

Is there some other best practice for this use case?  It might be "use
a SIGCHLD handler and query jobs to see what jobs have terminated,
then call wait <pid> on each" or "I don't recommend using bash/sh for
this."  Obviously I could also be overlooking some aspect of wait -n
or other bash features that would help here.

I _don't_ want bash to maintain some sort of internal state about
which jobs have and haven't been returned by wait -n, which would be
complicated and brittle (this is what my mental model was).  I'd want
it to look in  the terminated table for finished jobs amongst the
provided list of pids, and then I'd manage the list of pids myself,
removing pids that were previously returned from wait -n.  This is a
change in semantics and might introduce inconsistencies and difficulty
implementing, I'm just describing what I think would be useful for my
specific needs.

A bit of brainstorming: between Linux's pidfds and BSD's
kqueue/process descriptors one ought to be able to build this as an
external command that polls for non-child processes to terminate.  It
couldn't return an exit status, but it could at least indicate which
process finished or couldn't be found and thus had already finished.
Then you could use posix "wait <pid>" to get the exit status and be
guaranteed that it wouldn't block (a simple timeout option to wait
might be useful here for cases where bash's child process may not be
visible to an external command).  I'm not aware of anything like this
existing, but it would be a nice way to separate this functionality
from the shell, reduce the number of options in wait, and support
other shells.

Again, thanks for your patience Chet,
Steve

Re: wait -n misses signaled subprocess

Reply via email to