On Wed, Nov 11, 2020 at 02:34:57PM -0600, Derek Martin wrote: > [...] I would also agree that Mutt's exit status handling is > incomplete. That probably needs to be fixed. I'd need to review > the code in more detail to be confident about whether > blocking/ignoring SIGCHLD is appropriate or not.
So I've just had a look at this, and I have some comments. A proper fix would be...complicated. Bottom line, Mutt's code IS definitely causing part of the headache, and I'd hoped to provide a proposal to fix it with this message, but it would take more of a rework than I have time for. THAT SAID, my read of the code strongly suggests my initial guess about what's wrong with Matthias' case was correct. That means fixing this bug won't actually fix that problem--it can only make it clearer to the user what happened--and it also means that Matthias should be able to fix the problem by fixing the broken sendmail script Mutt is calling. Specifically, Mutt uses alarm() to stop waiting too long for a rogue sendmail process, and the mutt-users post indicated that SIGALRM was what stopped waitpid() from continuing to wait. So, the script that Mutt launched never completed within SendmailWait seconds, whatever that is, and Mutt decided to stop waiting for it. SIGALRM doesn't just happen at random--that's really the only plausible explanation and it matches the code. The rest of what I said then would necessarily have happened: The intermediate fork exited (see more below about sendlib.c), which closed the script's STDIN, which caused sendmail to get EOF reading the message and finally send it. This also explains why waitpid() was getting ECHILD: The child was in fact not dead (see more below under background.c). So to be clear, (I'm as certain as I can be without access to the code that) Matthias' problem is in Mattias' code, not Mutt. Mutt just makes it worse/confusing by doing poor error handling. Now, turning attention back to Mutt's code, a few notes about the bad, the worse, and the ugly... :) system.c: Mostly looks fine, BUT resets the disposition of SIGCHLD to SIG_DFL, which seems right, but differs from signal.c where it sets it to chld_handler (which just sets a flag). See below. signal.c: Sets the handler for SIGCHLD to chld_handler. This is largely pointless. It just sets SigChld = 1. That's generally the correct way to handle a signal, but in this case it does nothing useful since Mutt calls waitpid all over the code. The only effect of setting SigChld is it triggers a redraw in update_bg_menu, but that can easily be eliminated by making the decision to do a redraw depend entirely on unconditionally calling mutt_background_process_waitpid, eliminating the SigChld check. In one place, mutt_background_compose_menu arguably abuses SigChld (the global flag, not the signal) to force a redraw. Not exactly incorrect... just a little gross, and completely unnecessary. Plus, unless I missed something, it appears that if _mutt_system() is ever called, this stops happening anyway, since it resets SIGCHLD's disposition to SIG_DFL (the signal handler will no longer ever execute), which likely proves that the above suggestion is already working. And with SIGCHLD set that way, there's no cause to block it. This is The Right Thing™. background.c: background_edit_landing_page() has this code: while (!done) { wait_rc = waitpid (bg_pid, NULL, WNOHANG); if ((wait_rc > 0) || ((wait_rc < 0) && (errno == ECHILD))) { rc = 0; break; } ... Checking for ECHILD is unnecessary if children are handled properly. Or, said another way, if you're not ignoring SIGCHLD (i.e. SIG_IGN) then the only way you can get ECHILD is if the child IS NOT DEAD. In this context (background editing), that scenario makes no sense. Since you're not ignoring your children, they will zombify until you ask for their status, so their PID can not be reused, and they will still be in the process table until you call waitpid(). The only other potential way you could have this happen is if you called waitpid() twice on the same pid. Don't do that. :) This also eats the exit status of the editor, so mutt can't do anything intelligent with it later (like report that the editor failed/crashed). Worse, mutt_background_compose_menu() calls waitpid and doesn't even check the return value. Not sure what the point of this is, but I'd assume the user would want to know if that process failed or crashed, and that's not happening. sendlib.c: Properly handling the return from waitpid() is a little tricky, and like most code I've seen, this code doesn't. This looks like the cause of the behavior Matthias saw, though it probably won't happen if there isn't also something wrong with his send script. The NATURE of the bug is that it causes Mutt to misreport to the user what happened. Fixing this almost certainly won't fix Matthias' problem, but may at least help make it clear why it's happening. There's also a bit of trickiness in what to do about some of the cases... But in this instance, basically Mutt should report to the user that the process was killed by a signal, what the signal was, and indicate that it can't know if sending the message was actually successful. Trouble is, the way send_msg() forks twice, it provides no good means for communicating the full status back to the parent Mutt process. When you call exit(status), The OS hands the least significant bits of status to you, plus a bunch of extra bits. If an intermediate process gets back an exit status, and then calls exit with that same exit status, a bunch of important information is lost. Mutt could provide a different means of communicating this status back down the process tree to the parent Mutt process, but... doesn't. So... What I would do, if I were going to fix this is: - Get rid of SigChld and the chld_handler, and remove all instances of blocking or ignoring SIGCHLD. Leave SIGCHLD set to SIG_DFL. - Fix all calls to waid or waitpid to look for at least 3 cases: + WIFEXITED (exited "normally" with some exit status) + WIFSIGNALED (killed by a signal, WSTOPSIG() gives the sigal) + "else" (stopped, continued, any OS-specific cases). You could split these out to separate cases, but prolly not useful. These cases aren't the same and shouldn't overlap, so not checking for them produces undefined behavior in Mutt. Some calls to waitpid are not even checking their return status. - drop the extra fork in send_msg() and/or implement a different means of communicating the FULL exit status (not just the exit code) so you can use the W* macros on it - It may make sense that sometimes, you can't do much of anything with the status returned by waitpid(). Mutt already has a mechanism implemented to save those for later, but many of the calls are not even using that. Any time this happens, Mutt is throwing away status that the user NEEDS in order to make an informed decision about how to handle the exception. Though, mostly I think if you can't use it where it was obtained, the code flow needs a redesign. -- Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02 -=-=-=-=- This message is posted from an invalid address. Replying to it will result in undeliverable mail due to spam prevention. Sorry for the inconvenience.
signature.asc
Description: PGP signature