'wait -n' with and without id arguments

Zachary Santer Sat, 20 Jul 2024 10:51:17 -0700

Was "waiting for process substitutions"

For context:
$ git show -q origin/devel
commit 6c703092759ace29263ea96374e18412c59acc7f (origin/devel)
Author: Chet Ramey <chet.ra...@case.edu>
Date:   Thu Jul 18 16:48:17 2024 -0400

    job control cleanups; wait -n can return terminated jobs if
supplied pid arguments; wait -n can wait for process substitutions if
supplied pid arguments

On Thu, Jul 18, 2024 at 12:36 PM Chet Ramey <chet.ra...@case.edu> wrote:
>
> On 7/14/24 8:40 PM, Zachary Santer wrote:
>
> > On Fri, Jul 5, 2024 at 2:38 PM Chet Ramey <chet.ra...@case.edu> wrote:
> >>
> >> There is code tagged
> >> for bash-5.4 that allows `wait -n' to look at these exited processes as
> >> long as it's given an explicit set of pid arguments.
> >
> > I agree with all the knowledgeable people here telling you that the
> > way 'wait -n' is still implemented in bash-5.3-alpha is obviously
> > wrong, but I also want to point out that the way you plan to change
> > its behavior in bash-5.4 still won't allow Greg's example below to
> > work reliably.
>
> OK, but how would you do that? If a job has already terminated, and been
> removed from the jobs list, how would you know that `wait -n' without pid
> arguments should return it? There can be an arbitrary number of pids on
> the list of saved pids and statuses -- the only way to clear it using wait
> is to run `wait' without arguments.
>
> You can say not to remove the job from the jobs list, which gets into the
> same notification issues that originally started this discussion back in
> January, and I have made a couple of changes in that area in response to
> the original report that I think will address some of those. But once you
> move the job from the jobs list to this list of saved pids, `wait' without
> -n or pid arguments won't look for it any more (and will clear the list
> when it completes). Why should `wait -n' without pid arguments do so?

'wait' without -n or pid arguments doesn't look in the list of saved
pids and statuses simply because it would serve no purpose for it to
do so. The return status will be 0, no matter how any child process
terminated, or even if there never was a child process. *

For 'wait -n', on the other hand:
"If the -n option is supplied, wait waits for a single job from the
list of ids or, if no ids are supplied, any job, to complete and
returns its exit status."
People are going to naturally expect 'wait -n' without pid arguments
to return immediately with the exit status of an already-terminated
child process, even if they don't provide id arguments. In order to do
so, 'wait -n' obviously has to look in the list of saved pids and
statuses.

> > On Fri, Jul 12, 2024 at 9:06 PM Greg Wooledge <g...@wooledge.org> wrote:
> >>
> >> greg@remote:~$ cat ~greybot/factoids/wait-n; echo
> >> Run up to 5 processes in parallel (bash 4.3): i=0 j=5; for elem in 
> >> "${array[@]}"; do (( i++ < j )) || wait -n; my_job "$elem" & done; wait
> >
> > He'd have to do something like this:
> > set -o noglob
> > i=0 j=5
> > declare -a pid_set=()
> > for elem in "${array[@]}"; do
> >    if (( ! i++ < j )); then
> >      wait -n -p terminated_pid -- "${!pid_set[@]}"
> >      unset pid_set[terminated_pid]
> >    fi
> >    my_job "$elem" &
> >    pid_set[${!}]=''
> > done
> > wait
> >
> > It's probably best that 'wait -n' without arguments and 'wait -n' with
> > explicit pid arguments have the same relationship to each other as
> > 'wait' without arguments and 'wait' with explicit pid arguments.
>
> That's pretty much what we're talking about here. `wait' without arguments
> doesn't look in the list of saved statuses whether -n is supplied or not.
> `wait' with pid argument should look in this list whether -n is supplied or
> not. But see below for the differences between `wait' with and without pid
> arguments whether -n is supplied or not.

Try to put yourself in the shoes of someone who doesn't know the first
thing about how bash handles child processes internally or its
implementation of 'wait'.

> > In other words, process substitutions notwithstanding,
> > $ wait
> > and
> > $ wait -- "${all_child_pids[@]}"
> > do the same thing.
>
> That's just not true, and they're not even defined to do the same thing.
> If you ask for a specific pid argument, wait will return its exit status
> even if the job it belongs to has been removed from the jobs list and
> saved on the list of saved pids and statuses. wait without pid arguments
> just makes sure there are no running child processes and clears the list of
> saved statuses -- it has no reason to look at the saved pid list before it
> clears it.

Yes, but return status aside, from the user's perspective, they do the
same thing. If you explicitly wait for the pids of all the background
jobs that have ever been forked, all elements of the list of saved
pids and statuses will have been cleared. Same deal if you provide no
id arguments.

> > So,
> > $ wait -n
> > and
> > $ wait -n -- "${all_child_pids[@]}"
> > should also do the same thing.
>
> One issue here is that wait without arguments clears the list of saved
> statuses. `wait -n' without arguments doesn't do that, but it probably
> should since it's now going to have access to that list, though it would
> no doubt break some existing use cases.

If 'wait -n' without arguments were to instead return the termination
status of one of the already-terminated child processes, then the user
would expect it to only clear that particular element of the list of
saved pids and statuses.

> The other issue is as above: why should `wait -n' with no pid arguments
> do anything with processes on this list? And if you think it should, what
> should it do with those processes?

I'd say the comments from kre and Greg still apply here:

On Fri, Jul 12, 2024 at 8:41 PM Robert Elz <k...@munnari.oz.au> wrote:
>
> [U]se the first definition of "next job to
> finish" - and in the case when there are already several of them,
> pick one, any one - you could order them by the time that bash reaped
> the jobs internally, but there's no real reason to do so, as that
> isn't necessarily the order the actual processes terminated, just
> the order the kernel picked to answer the wait() sys call, when
> there are several child zombies ready to be reaped.

On Fri, Jul 12, 2024 at 9:06 PM Greg Wooledge <g...@wooledge.org> wrote:
>
> If two jobs happen to finish simultaneously, the next call to wait -n
> should reap one of them, and then the call after that should reap
> the other.  That's how everyone wants it to work, as far as I've seen.
>
> *Nobody* wants it to skip the job that happened to finish at the exact
> same time as the first one, and then wait for a third job.  If that
> happens in the loop above, you'll have only 4 jobs running instead of 5
> from that point onward.

That said, if you've changed 'wait -n' with pid arguments like it
sounds like you have, then it probably provides the answer already.

If my ${pids[@]} array contains the pids of all background jobs that
have ever been forked, along with "the last-executed process
substitution, if its process id is the same as $!", then
$ wait -n -p terminated_pid
and
$ wait -n -p terminated_pid -- "${pids[@]}"
can actually do the same thing as each other. In either case, the user
would expect it to return with the termination status of one of these
child processes, and the id of that terminated child process should
show up in terminated_pid. There's no reason not to do this, since
there would be one call to 'wait -n' per child process, regardless.

* Or in any case, 'wait' without arguments as the first command in a
script seemed to have a termination status of 0. But then the manual
says:
"If none of the supplied arguments is a child of the shell, or if no
arguments are supplied and the shell has no unwaited-for children, the
exit status is 127."

Honestly, the whole description of 'wait' in the manual is a little confusing.
"If id is not given, wait waits for all running background jobs and
the last-executed process substitution, if its process id is the same
as $!, and the return status is zero."
"Otherwise, the return status is the exit status of the last process
or job waited for."
It might benefit from separating the descriptions of 'wait' and 'wait
-n' into separate paragraphs, each of which clearly defining what
happens with and without explicit id arguments. And a separate
paragraph for -f for good measure.

'wait -n' with and without id arguments

Reply via email to