Date: Wed, 8 Dec 2021 08:42:21 -0500 (EST) From: Mouse <mo...@rodents-montreal.org> Message-ID: <202112081342.iaa29...@stone.rodents-montreal.org>
| > That would not have worked, non-interactive shells are forbidden from | > allowing the script to change the status of a signal which is ignored | > when the shell starts: [...POSIX...] | | Then POSIX is broken and there needs to be a way to disable this | particular bit of braindamage; POSIX is (once again) caught between its two masters there. On one hand it is (in this case, it is similar for the rest of the standard, just different wording would be used ... in my e-mail I mean) providing a specification for application script writers telling them what the shell will do if they do xyz. For this, it must specify what (bug free) shells actually do (and is why we end up with a whole bunch of "... is unspecified" (or worse, undefined in a few cases) when shells don't agree on what to do. But this one is a case where the shells all do agree, as it has been like this since Bourne added the trap command in the earliest (released) Bourne shell. (If you like, it means that you have had > 40 years to submit a bug report about this, don't you think you're a little late now?) So, POSIX needs to tell the script writers that if they attempt to set a trap on a signal which was ignored when the shell started, it will not work (similarly if they attempt to reset it to SIG_DFL). On the other hand (the other master), POSIX is also a spec for shell implementors telling them what to implement. It is sometimes difficult to distinguish those two functions, but if it was done, there are many places where the user could be told "this is what works" and the implementors could be told "implement it better than that, this is how it should work" - but there seems to be no interest in going down that path, and I kind of understand why, doing all of it would be a huge task, and if it was only part done, there would be real confusion about how to interpret the sections which had not been updated. For what it is worth, I believe all this came about, because way back in 6th edition days (or perhaps even earlier) also early 7th edition and 32V (probably even 3BSD as well) the paradigm for setting a signal handler in a C program was if (signal(s, SIG_IGN) != SIG_IGN) signal(s, handler); (where handler could also be SIG_DFL, but obviously would make no sense for it to be SIG_IGN). Often the result from the first signal() call would be saved so it can be restored when appropriate - but that depends upon the needs of the application, and is irrelevant to this discussion. All this developed in the old days, when there was no way to block signals, no process groups, no job control, ... When a user typed the interrupt character, the kernel would simply queue a SIGINT (similarly for SIGQUIT and SIGHUP in appropriate circumstances) to all processes with that terminal as their controlling tty. Foreground processes, background processes, everything (not that there was any other type). When a user ran a command in the background, the shell would start it with SIGINT and SIGQUIT ignored, so even though the kernel would send the relevant signal to the process every time the relevant key was typed, it would just be ignored. That is, provided the program did not enable signals. (SIGHUP was ignored by the nohup command). So, for processes that wanted (including needed) to see SIGINT when run in the foreground, and so wanted to do oldsigint = signal(SIGINT, intr_handler); the procedure was to code it as above, so as to make sure that if it happened to be run in background, there was not even the smallest window where if the user happened to type the intr char at the wrong time, it would affect the background process. Of course, this left a (very small) window where a foreground process (and in this, do remember, that there was no practical way for a process to determine whether it was foreground or background, the only difference was whether the parent was waiting for it or not) was arranging to catch the signal, where a typed intr would end up being ignored (if it just happened to occur during the small interval between the two signal calls). That's a much less serious problem, as when the first intr the user types doesn't work, they just send another one, which will work. Now, really, all of this is only important for the tty generated signals, originally SIGHUP SIGINT and SIGQUIT (perhaps worth noting that they are 1 2 and 3 in the signal number list...) (and now the SIGTSTP SIGTTIN SIGTTOU are added) - there's no real need to be concerned about a SIGSEGV or SIGEMT (or even SIGPIPE) being accidentally sent to the process, at just the wrong time, when it wasn't wanted. That just doesn't happen. Never did. But that kind of analysis wasn't common (and perhaps wouldn't even be believed) by lots of people who had been affected by sloppy programs which didn't use this technique (for the tty signals) and caused background processes to just "mysteriously vanish" when the user sent an interrupt signal intended for some other process. The long and short if it is that it became accepted as that 2 line sequence is the one true way to trap a signal, and anything else is broken, and must be fixed. Dumb perhaps, and barely even relevant in these days of process groups, where the tty signals are only sent to the foreground process group, and background processes never see them at all, but that's what happened, as best I remember and understood it all. Anyway, what is clear, is that that is exactly what Bourne wrote when his shell was coded and the trap command was added. That's why the standard says that there isn't required to be an error message when a script attempts to set a trap on an ignored signal, as the shell just did all the normal trap setting bookkeeping overhead, and then called its signal setting code to catch the signal, and that code used the sequence above, resulting in an already ignored signal remaining ignored. (It is actually a bit more complex than that, as the shell did allow, as do all current ones, a script to ignore a signal, and then set it back to default, or trap it, later, but that's all just implementation detail - the principle that a signal ignored on entry remains ignored was ingrained, and remains that way). Good luck getting this one changed. Posix is certainly not going to even think about changing this as long as shells continue to behave the way they do (which in this case, is to do what the standard says, which is also just copying what every other Bourne compat shell has done forever). Getting shells to alter would be difficult as well, as when making a change like this, it is never clear what scripts might exist that depend upon it, and no-one wants to break backwards compatibility with old scripts, or not without a very good reason, and lots of advance planning. The most likely route forward would be for shells to implement some new builtin command, similar to trap, but with some of the current weirdness, including this, fixed. But to do that, how the new method interacts with the old would need to be sorted out. kre