CWRU/CWRU.chlog: > 8/26 > ---- > execute_cmd.c > [...] > - execute_connection: in default mode, bash performs jobs notifications > in an interactive shell between commands separated by ';' or '\n'. > It shouldn't do this in posix mode, since posix now specifies when > notifications can take place
I forgot your comment below about the shell not being interactive any time it's not accepting input from the user and took this to mean that 'jobs' notifications would only ever be printed immediately prior to a prompt when bash is in posix mode. I don't understand what posix mode changes relative to the existing behavior if not that. > jobs.c > - notify_and_cleanup: make interactive shells notifying during sourced > scripts dependent on the shell compatibility level and inactive in > versions beyond bash-5.2 > Inspired by report from Zachary Santer <zsan...@gmail.com> Making 'jobs' notifications not happen while the interactive shell is sourcing a script misses the cases where a function is otherwise executed directly from the command line and of course a whole bunch of commands separated by semicolons entered in one command line. New wait-n-failure attached. (Apparently ${SECONDS} can't be declared local and still work.) Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: msys Compiler: gcc Compilation CFLAGS: -g -O2 uname output: MSYS_NT-10.0-19045 Zack2021HPPavilion 3.5.3-d8b21b8c.x86_64 2024-07-09 18:03 UTC x86_64 Msys Machine Type: x86_64-pc-msys Bash Version: 5.3 Patch Level: 0 Release Status: alpha Devel branch commit 2610d40b. $ ./bash ~/random/wait-n-failure run run true explicit_pids false monitor false notify false posix false bash 5.3.0(1)-alpha 100 processes waited / 100 processes forked 11 seconds $ ./bash ~/random/wait-n-failure run explicit_pids run true explicit_pids true monitor false notify false posix false bash 5.3.0(1)-alpha 100 processes waited / 100 processes forked 12 seconds $ ./bash ~/random/wait-n-failure run monitor run true explicit_pids false monitor true notify false posix false bash 5.3.0(1)-alpha 100 processes waited / 100 processes forked 11 seconds $ ./bash ~/random/wait-n-failure run monitor notify run true explicit_pids false monitor true notify true posix false bash 5.3.0(1)-alpha 100 processes waited / 100 processes forked 11 seconds $ ./bash ~/random/wait-n-failure run monitor posix run true explicit_pids false monitor true notify false posix true bash 5.3.0(1)-alpha 100 processes waited / 100 processes forked 11 seconds $ ./bash ~/random/wait-n-failure run explicit_pids monitor run true explicit_pids true monitor true notify false posix false bash 5.3.0(1)-alpha 100 processes waited / 100 processes forked 12 seconds All good. $ source ~/random/wait-n-failure run run true explicit_pids false monitor false notify false posix false bash 5.3.0(1)-alpha 96 processes waited / 100 processes forked 12 seconds Hmm. $ source ~/random/wait-n-failure run explicit_pids run true explicit_pids true monitor false notify false posix false bash 5.3.0(1)-alpha 100 processes waited / 100 processes forked 10 seconds Better. $ source ~/random/wait-n-failure run monitor run true explicit_pids false monitor true notify false posix false bash 5.3.0(1)-alpha [5]+ Done wait-n-failure_random_sleep [1] Done wait-n-failure_random_sleep [2] Done wait-n-failure_random_sleep [3] Done wait-n-failure_random_sleep [4]- Done wait-n-failure_random_sleep [5]- Done wait-n-failure_random_sleep [6]- Done wait-n-failure_random_sleep [7]- Done wait-n-failure_random_sleep [8] Done wait-n-failure_random_sleep [9] Done wait-n-failure_random_sleep [10]+ Done wait-n-failure_random_sleep [1]+ Done wait-n-failure_random_sleep [1]+ Done wait-n-failure_random_sleep [1]+ Done wait-n-failure_random_sleep [... All following "Done" notifications are for jobs with job id 1.] 96 processes waited / 100 processes forked 11 seconds I did not expect to see job notifications here. The changelog seems pretty clear that there shouldn't be any. We get to see what was going on above, though. After a little while, there's only one child process running at a time - why they all get assigned job id 1. So 'wait -n' is now guaranteed to wait for *something*, but it won't necessarily wait for everything. Four concurrent processes have been lost by the time the script completes. $ source ~/random/wait-n-failure run monitor notify run true explicit_pids false monitor true notify true posix false bash 5.3.0(1)-alpha [1] Done wait-n-failure_random_sleep [2] Done wait-n-failure_random_sleep [3] Done wait-n-failure_random_sleep [4] Done wait-n-failure_random_sleep [5]- Done wait-n-failure_random_sleep [6]+ Done wait-n-failure_random_sleep [1]+ Done wait-n-failure_random_sleep [1]+ Done wait-n-failure_random_sleep [1]+ Done wait-n-failure_random_sleep [... All job id 1 again.] 96 processes waited / 100 processes forked 12 seconds Same deal here. $ source ~/random/wait-n-failure run monitor posix run true explicit_pids false monitor true notify false posix true bash 5.3.0(1)-alpha [2] Done wait-n-failure_random_sleep [1] Done wait-n-failure_random_sleep [3] Done wait-n-failure_random_sleep [4] Done wait-n-failure_random_sleep [5]- Done wait-n-failure_random_sleep [6]+ Done wait-n-failure_random_sleep [1]+ Done wait-n-failure_random_sleep [1]+ Done wait-n-failure_random_sleep [1]+ Done wait-n-failure_random_sleep [... All job id 1.] 96 processes waited / 100 processes forked 12 seconds I wasn't expecting 'jobs' output while sourcing, and I thought posix mode would make it not output any 'jobs' info until immediately prior to a prompt. $ source ~/random/wait-n-failure run explicit_pids monitor run true explicit_pids true monitor true notify false posix false bash 5.3.0(1)-alpha [1] Done wait-n-failure_random_sleep [2] Done wait-n-failure_random_sleep [3] Done wait-n-failure_random_sleep [4] Done wait-n-failure_random_sleep [5] Done wait-n-failure_random_sleep [6] Done wait-n-failure_random_sleep [7] Done wait-n-failure_random_sleep [8]+ Done wait-n-failure_random_sleep [1] Done wait-n-failure_random_sleep [2] Done wait-n-failure_random_sleep [...] 100 processes waited / 100 processes forked 11 seconds Much more what I would expect to see for job ids. This is already a whole lot of testing output to throw in the body of an email, but the ids go up and down, never settling to all jobs having job id 1. We allow the functions to be available on the command line: $ source ~/random/wait-n-failure run false $ wait-n-failure_main explicit_pids false monitor false notify false posix false bash 5.3.0(1)-alpha [1] 2295 [2] 2296 [3] 2297 [4] 2298 [5] 2299 [6] 2300 1 processes waited / 6 processes forked 0 seconds I guess telling the user the job id and pid of the background job just forked isn't specifically a monitor mode thing? We can basically see the same behavior we saw when sourcing wait-n-failure before you made this change to the devel branch, though. You're just narrowing the cases where you get this behavior. $ wait-n-failure_main explicit_pids explicit_pids true monitor false notify false posix false bash 5.3.0(1)-alpha [1] 2315 [2] 2316 [3] 2317 [4] 2318 [5] 2319 [6] 2320 [7] 2322 [1] 2324 [1] 2326 [1] 2328 [... We see job ids hovering around 1 and 2 a lot, but they find their way all the way up to 9 and back down again. Probably just the foreground call to wait-n-failure_random_sleep () causing this.] 100 processes waited / 100 processes forked 10 seconds Again, the goal is for calling this function without explicit_pids to give the same behavior as we currently see, calling it with that argument, at least given a lack of preexisting, un-waited-for child processes. $ wait-n-failure_main monitor explicit_pids false monitor true notify false posix false bash 5.3.0(1)-alpha [1] 2510 [2] 2511 [3] 2512 [4] 2513 [5] 2514 [1] Done wait-n-failure_random_sleep [2] Done wait-n-failure_random_sleep [4]- Done wait-n-failure_random_sleep [6] 2515 [3] Done wait-n-failure_random_sleep [5]- Done wait-n-failure_random_sleep [6]+ Done wait-n-failure_random_sleep [1] 2517 [1]+ Done wait-n-failure_random_sleep 2 processes waited / 7 processes forked 0 seconds Bad. $ wait-n-failure_main monitor notify explicit_pids false monitor true notify true posix false bash 5.3.0(1)-alpha [1] 2519 [2] 2520 [1]- Done wait-n-failure_random_sleep [3] 2521 [4] 2522 [5] 2523 [4]- Done wait-n-failure_random_sleep [6] 2524 [5]- Done wait-n-failure_random_sleep [2] Done wait-n-failure_random_sleep [3] Done wait-n-failure_random_sleep [6]+ Done wait-n-failure_random_sleep 1 processes waited / 6 processes forked 0 seconds Bad. $ wait-n-failure_main monitor posix explicit_pids false monitor true notify false posix true bash 5.3.0(1)-alpha [1] 2526 [2] 2527 [3] 2528 [4] 2529 [5] 2530 [1] Done wait-n-failure_random_sleep [6] 2531 [2] Done wait-n-failure_random_sleep [3] Done wait-n-failure_random_sleep [4] Done wait-n-failure_random_sleep [5]- Done wait-n-failure_random_sleep [6]+ Done wait-n-failure_random_sleep 1 processes waited / 6 processes forked 1 seconds Bad. $ wait-n-failure_main explicit_pids monitor explicit_pids true monitor true notify false posix false bash 5.3.0(1)-alpha [1] 2533 [2] 2534 [3] 2535 [4] 2536 [5] 2537 [2] Done wait-n-failure_random_sleep [6] 2538 [1] Done wait-n-failure_random_sleep [3] Done wait-n-failure_random_sleep [4] Done wait-n-failure_random_sleep [7] 2540 [...] 100 processes waited / 100 processes forked 10 seconds Good. On Mon, Aug 26, 2024 at 10:57 AM Chet Ramey <chet.ra...@case.edu> wrote: > > On 8/14/24 11:22 PM, Zachary Santer wrote: > > On Wed, Aug 14, 2024 at 3:22 PM Chet Ramey <chet.ra...@case.edu> wrote: > >> > >> On 8/7/24 2:47 PM, Zachary Santer wrote: > > > >>> If you want the behavior of 'wait -n' to be > >>> consistent between scripts and the interactive shell, then it should > >>> choose one terminated child process from the list of those that is > >>> maintained in the interactive shell, if it's nonempty, to report to > >>> the user and to clear from that list, any time it is called. > >> > >> I'm not sure returning the status of some random process from some > >> arbitrary point in the past is going to be valuable. > > > > I think the value is in the consistent behavior of 'wait -n', which > > this would provide. If the user is intent on running 'wait -n' without > > id arguments in the interactive shell, they can ensure that child > > processes forked long ago are ignored by simply calling 'wait' without > > -n before moving on to what they're trying to do. > > Sure, they can do that. That's a new requirement, though. I've seen you point out "I can't imagine why a person would do X, so it must never happen" as being fallaciou. However, I think the benefit to consistent behavior far outweighs the hardship caused to whoever would write a script intended for use within the interactive shell that depends on 'wait -n' without id arguments ignoring background processes that the user has already been notified of via the 'jobs' output. If the behavior here isn't modified, the man page really should note that 'wait -n' without id arguments won't return the termination status of a child process that has already been notified through the 'jobs' output. This still happens in the interactive shell when job control is disabled, for that matter. Just having to come up with a way to explain this behavior in the man page seems like solid motivation to change it. > > On Wed, Aug 14, 2024 at 4:44 PM Robert Elz <k...@munnari.oz.au> wrote: > >> > >> | Maybe the thing to do is to retain jobs in the job list, even after > >> | they're marked as notified, > >> > >> I'd do the opposite, once they're notified, they should be deleted > >> from the jobs table, and everywhere else. But "notified" only happens > >> when the script explicitly asks (in a non-interactive shell, never because > >> of any other event than an appropriate command issued by the script, and > >> in an interactive shell, the same, or the implicit "jobs" before each PS1). > > > > The implicit 'jobs' isn't happening before each PS1, > > This isn't what POSIX says to do, anyway. > > but after each > > command completes. Thus, all the > >> [1] Done random_sleep > > notifications when sourcing wait-n-failure, before it prints > >> 3 processes waited / 8 processes forked > >> 1 seconds > > and exits. > > Kind of. The `interactive shell' isn't interactive while it's not reading > input from the terminal, so the shell prints notifications when a job > terminates. This is what happens when you source a file. So my initial understanding of what 'set -o posix' was supposed to do now was wrong? > > So, actually only doing the implicit 'jobs' work and moving things > > from the jobs table to the list of saved pids and statuses before each > > PS1 *would* be a solution here. > > Before the next prompt, you probably mean. > > > When sourcing wait-n-failure, it's > > going to do all its work before any PS1 prompt. > > The behavior of performing notifications and removing jobs from the table > is long-standing: it's been this way since 1999, and is a mechanism to > prevent long-running sourced scripts from filling up the jobs list (which > was a lot smaller in '99). So you need to accommodate those backwards > compatibility issues somehow. 'wait -n' without id arguments reporting the termination status of a child process that has already been reported to the user through the 'jobs' output and clearing that information from the list of saved ids and statuses would then be less of a disruption. > > I'm less concerned about what happens when a user types 'wait -n' > > independently on the command line. The human is in the loop at that > > point. > > The shell is interactive at that point; different rules apply. > > > >>> So basically, 'wait -n' should be implemented such that sourcing the > >>> script with a false argument gives the same behavior as you've seen > >>> when sourcing it with a true argument: the infinite loop. > >> > >> How long should notification be deferred? Until the script completes? > > > > That's more or less the solution I presented above. 'wait -n' without > > id arguments returning the termination status of a child process that > > the user has already been informed of through the implicit 'jobs' > > output would also work, and might be less of a weird behavior change > > for users to get over. > > OK. How would you reconcile the backwards compatibility issue? There's always ${BASH_COMPAT}, but considering the surprising and arguably undesirable nature of 'wait -n' without id arguments not returning the termination status of a child process that has already been reported to the user through the 'jobs' output, I would really question why someone would write code dependent on that behavior in the first place. And again, this issue has never come up in a script intended to be called normally (without it calling 'jobs'). This whole issue is such a corner case, though it seems like an easily-solved problem. > There are only three approaches. And those are? On Mon, Aug 26, 2024 at 11:01 AM Chet Ramey <chet.ra...@case.edu> wrote: > > On 8/16/24 8:21 AM, Zachary Santer wrote: > > On Wed, Aug 14, 2024 at 11:22 PM Zachary Santer <zsan...@gmail.com> wrote: > >> > >> The implicit 'jobs' isn't happening before each PS1, but after each > >> command completes. Thus, all the > >>> [1] Done random_sleep > >> notifications when sourcing wait-n-failure, before it prints > >>> 3 processes waited / 8 processes forked > >>> 1 seconds > >> and exits. > >> > >> So, actually only doing the implicit 'jobs' work and moving things > >> from the jobs table to the list of saved pids and statuses before each > >> PS1 *would* be a solution here. When sourcing wait-n-failure, it's > >> going to do all its work before any PS1 prompt. Same deal if a user > >> wants to call a function with 'wait -n' in it from the command line, > >> invoke the edit-and-execute-command readline command, or just type a > >> bunch of different commands separated by semicolons into a single > >> command line. > > > > This breaks down with 'set -b'/'set -o notify'. Short of 'wait -n' > > printing a warning message or erroring out when it is invoked while > > 'set -b' is active, this isn't a complete solution. > > If you enable the notify option, which is not the default, you should be > responsible for managing the consequences. notify is always going to result > in different behavior; see > > https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html#tag_19_11 It's not clear from the bash manual that there's a relationship between printed 'jobs' notifications and what 'wait -n' without id arguments will report. Under the (fair) assumption that there is none, one would think that 'set -b' would also have no effect. > > I really think the solution here is for 'wait -n' to return the > > termination status of a child process that has already terminated and > > that the user has already been informed of. Ultimately, whatever set > > of commands is being invoked together and the user who is being > > informed of terminated child processes are two different things. > > Informing the user does nothing for the set of commands. > > No, that counts as notification. After the user is notified, the shell > is free to remove the job from the list. Bash happens to keep the status > around for a while; Bash does that because that behavior is more useful. The user might want to call 'wait' with an id argument and find that process's termination status programmatically, despite the 'jobs' output having already informed them. In the same vein, it's more useful for 'wait -n' to be able to guarantee a one-to-one relationship of forked child process to 'wait -n'-returned termination status. > kre, for instance, advocates removing it entirely. That would preclude what he was asking for earlier, wouldn't it? On Fri, Jul 12, 2024 at 8:41 PM Robert Elz <k...@munnari.oz.au> wrote: > > [U]se the first definition of "next job to > finish" - and in the case when there are already several of them, > pick one, any one - you could order them by the time that bash reaped > the jobs internally, but there's no real reason to do so, as that > isn't necessarily the order the actual processes terminated, just > the order the kernel picked to answer the wait() sys call, when > there are several child zombies ready to be reaped. Removing the status entirely after 'jobs'-output notification would prevent the above from working, right? Or maybe he was then under the same impression that I was: that 'wait -n' would fail to report the termination status of child processes that had terminated prior to the call to 'wait -n' in all circumstances. When it's the result of a race between the 'jobs' output and the call to 'wait -n', it's okay?
wait-n-failure
Description: Binary data