Re: 'wait -n' with and without id arguments

Chet Ramey Thu, 17 Oct 2024 15:15:26 -0700

On 9/29/24 12:55 PM, Zachary Santer wrote:

CWRU/CWRU.chlog:

    9/25
    ----
jobs.c
- wait_for_any_job: if the jobs table is empty and there are no
   eligible procsubs, and the shell is in posix mode, take a random
   pid from the bgpids table, delete it, and return its status
   (since we would be deleting that pid from bgpids anyway)


This is a really strange thing to implement.


It turned an error condition into a potentially successful return.

So potentially 'wait -n' is
waiting for a background job that's still executing when there's an
already-terminated background job that 'wait -n' would report right
then, had it not been notified.


This is reasonable. I changed the order in recent pushes.

On Wed, Sep 25, 2024 at 11:06 AM Chet Ramey <chet.ra...@case.edu> wrote:


On 9/8/24 8:35 PM, Zachary Santer wrote:

This is still a discussion about interactive shell behavior only.


I might argue that calling 'jobs' within a script being executed
normally shouldn't make background jobs that have already terminated
unavailable to 'wait -n' either.


Make up your mind. Is non-interactive shell behavior good, as you've said
before, and again at the end of this message, or is it not?

I'm considering using posix mode all the time, just to see if it makes
my life easier. Not that I know what it does, outside of this.


You could read the file POSIX in the devel branch, or break down and
read the texinfo manual, or look at

http://tiswww.case.edu/~chet/bash/POSIX

though that describes the current release.

That's not how job control works. Jobs are created and job numbers assigned
when the background process is created.


I was thinking the job id doesn't do the user any good if it's for a
job that they don't have the opportunity to act upon. It's come and
gone by the time they've got a command line again.


Act on how? If it terminates, you're not going to jobs/fg/bg/kill. You
can still wait.

(And who says they won't? Consider a SIGCHLD trap, or a command later
in the list running one of the above commands.)

2) Job ids assigned to background jobs continue to increase
monotonically, between accept-line and prompt, even as some of those
jobs are removed from the jobs table by calls to 'wait'.


No shell works this way, and there's not a good reason to adopt it.


I might be missing something, but bash sure seems to be doing this in
a number of different calls to wait-n-failure::main, on the current
devel branch commit. Are the jobs not being removed from the jobs
table until some later point?


What? When jobs are added to the jobs list, they use the index after
the largest index with a job. If you have four jobs, 1-4, and job 3
terminates, the next job created gets index 5. If job 4 terminates
instead, the next job created gets index 4. Is this not what you're
seeing?


It feels like I've confused myself by this point. I was considering
what bash would have to do, to not display job notifications at any
point except immediately prior to displaying the following prompt.
Bash is, of course, displaying job status notifications as jobs are
forked and as they terminate. That *is* prior to the following prompt,
but not *immediately* so. So the behavior I was trying to describe
would be a departure from the current posix mode behavior, but it
clearly isn't necessary.


Not in posix mode. This is how bash has always behaved in non-posix mode:

$ sleep 3 & sleep 4; echo a
[1] 48391
[1]+  Done                       sleep 3
a
$

And this is posix mode:

$ sleep 3 & sleep 4; echo a
[1] 48395
a
[1]+  Done                       sleep 3
$

If the 'jobs' builtin is called in the midst of a command list being
run with either behavior, this would cause the same updates to the
jobs table and list of saved pids and statuses as would occur
immediately prior to a prompt.


So you are saying that prompt notifications and `jobs' have the same
effect. POSIX implies but does not require this, and there is differing
behavior among current implementatations.


I've got no opinion on this point, actually.


You just described it. Are you saying you don't mind either behavior?

The user would have to know that
calling the 'jobs' builtin would have an impact on what processes
'wait -n' without id arguments will return the termination status of.
That would have to be documented in the man page.


This is posix mode.


How does the user know that?


How does a user know anything? What's the difference between "documented
in the man page" (presumably in JOB CONTROL or the `wait' description?)
and "documented as part of posix mode"?

In that case, the
behavior they would see, using 'wait -n', has already changed for the
better. The use of 'wait -n' without pid arguments in an interactive
environment is more likely to be something that a user just typed on
the command line themselves.


Why would a user do this? What's the use case for doing that in an
interactive shell? Not that it really matters.


Maybe testing out a bit of functionality they're trying to implement elsewhere.


I would hope that people understand the difference between interactive
and non-interactive shells and how they can have different behavior.

If the behavior here isn't modified, the man page really should note that
'wait -n' without id arguments won't return the termination status of a
child process that has already been notified through the 'jobs' output.


That is exactly the behavior posix seems to require (`wait -n' aside, but
see below): once you notify the user, you delete the job and it disappears
forever.


Should still be in the man page. Very few shell programmers are
reading the POSIX standard.


I added a mention to the job control section in the man page and info file,
and reworked the text in the posix mode section.


There is a posix mode section in bash.info.


There is also a URL in the man page that links to a file with the same
information. The man "page" is already 95 pages, does it really need to
be longer?

bash.info:

This manual is meant as a brief introduction to features found in
Bash.  The Bash manual page should be used as the definitive reference
on shell behavior.


That means that if the man page and the info file differ on something,
the man page is authoritative, not that the info file is meaningless.
Otherwise, why have it?

For instance, with the latest devel branch build:

$ set -o posix
$ sleep 2 &
[1] 20565
$
[1]+  Done                       sleep 2
$ wait 20565
$ echo $?
0
$ wait 20565
bash: wait: pid 20565 is not a child of this shell
$ echo $?
127
$

This is what you refer to below.


Yeah, I think that's an improvement. As long as posix mode never makes
that termination status unavailable to the first 'wait' call, because
it was already notified, then posix mode seems like the way to go.


We'll see.

'wait -n' with pid arguments now has access to this list, which is
good. It wouldn't be going much further to allow 'wait -n' without pid
arguments to act on the list as well.


Well, you'd either have to arrange things so the user doesn't get the same
pid and status returned multiple times -- by removing it from this list or
some other mechanism. Since that's what happens in posix mode, it looks
like posix mode fits your use case here.


And now I know that, but I don't even use 'wait -n' for anything.


Then we're just having an academic conversation.

The point here was to try to get the behavior of 'wait -n' to be as
consistent as possible, between different execution environments: the
interactive shell, a script being sourced, and a script being executed
normally; along with different set and shopt options. If you won't
consider modifying the behavior of 'wait -n' without id arguments in
default mode, then that's frustrating.


You might want to try posix mode for a while and see what happens. There
are very few people who do that; I'd be interested in feedback.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    c...@case.edu    http://tiswww.cwru.edu/~chet/

OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: 'wait -n' with and without id arguments

Reply via email to