On Wed, Nov 11, 2020 at 02:34:57PM -0600, Derek Martin wrote:
> [...] I would also agree that Mutt's exit status handling is
> incomplete.  That probably needs to be fixed.  I'd need to review
> the code in more detail to be confident about whether
> blocking/ignoring SIGCHLD is appropriate or not.

So I've just had a look at this, and I have some comments.  A proper
fix would be...complicated.  Bottom line, Mutt's code IS definitely
causing part of the headache, and I'd hoped to provide a proposal to
fix it with this message, but it would take more of a rework than I
have time for.

THAT SAID, my read of the code strongly suggests my initial guess
about what's wrong with Matthias' case was correct.  That means fixing
this bug won't actually fix that problem--it can only make it clearer
to the user what happened--and it also means that Matthias should be
able to fix the problem by fixing the broken sendmail script Mutt is
calling.

Specifically, Mutt uses alarm() to stop waiting too long for a rogue
sendmail process, and the mutt-users post indicated that SIGALRM was
what stopped waitpid() from continuing to wait.  So, the script that
Mutt launched never completed within SendmailWait seconds, whatever
that is, and Mutt decided to stop waiting for it.  SIGALRM doesn't
just happen at random--that's really the only plausible explanation
and it matches the code.  The rest of what I said then would
necessarily have happened:  The intermediate fork exited (see more
below about sendlib.c), which closed the script's STDIN, which caused
sendmail to get EOF reading the message and finally send it.  This
also explains why waitpid() was getting ECHILD: The child was in fact
not dead (see more below under background.c).

So to be clear, (I'm as certain as I can be without access to the code
that) Matthias' problem is in Mattias' code, not Mutt.  Mutt just
makes it worse/confusing by doing poor error handling.


Now, turning attention back to Mutt's code, a few notes about the bad,
the worse, and the ugly... :)

system.c:
  Mostly looks fine, BUT resets the disposition of SIGCHLD to SIG_DFL,
  which seems right, but differs from signal.c where it sets it to
  chld_handler (which just sets a flag).  See below.

signal.c:
  Sets the handler for SIGCHLD to chld_handler.  This is largely
  pointless.  It just sets SigChld = 1.  That's generally the correct
  way to handle a signal, but in this case it does nothing useful
  since Mutt calls waitpid all over the code.
  
  The only effect of setting SigChld is it triggers a redraw in
  update_bg_menu, but that can easily be eliminated by making the
  decision to do a redraw depend entirely on unconditionally calling
  mutt_background_process_waitpid, eliminating the SigChld check. In
  one place, mutt_background_compose_menu arguably abuses SigChld (the
  global flag, not the signal) to force a redraw.  Not exactly
  incorrect...  just a little gross, and completely unnecessary.

  Plus, unless I missed something, it appears that if _mutt_system()
  is ever called, this stops happening anyway, since it resets
  SIGCHLD's disposition to SIG_DFL (the signal handler will no longer
  ever execute), which likely proves that the above suggestion is
  already working.   And with SIGCHLD set that way, there's no cause
  to block it.  This is The Right Thing™.  

background.c:
  background_edit_landing_page() has this code:

    while (!done)
    {
      wait_rc = waitpid (bg_pid, NULL, WNOHANG);
      if ((wait_rc > 0) ||
          ((wait_rc < 0) && (errno == ECHILD)))
      {
        rc = 0;
        break;
      }
    ...

  Checking for ECHILD is unnecessary if children are handled properly.
  Or, said another way, if you're not ignoring SIGCHLD (i.e. SIG_IGN)
  then the only way you can get ECHILD is if the child IS NOT DEAD.
  In this context (background editing), that scenario makes no sense.  
  Since you're not ignoring your children, they will zombify until you
  ask for their status, so their PID can not be reused, and they will
  still be in the process table until you call waitpid().  The only
  other potential way you could have this happen is if you called
  waitpid() twice on the same pid.  Don't do that. :)

  This also eats the exit status of the editor, so mutt can't do
  anything intelligent with it later (like report that the editor
  failed/crashed).

  Worse, mutt_background_compose_menu() calls waitpid and doesn't even
  check the return value.  Not sure what the point of this is, but I'd
  assume the user would want to know if that process failed or
  crashed, and that's not happening.  

sendlib.c:
  Properly handling the return from waitpid() is a little tricky, and
  like most code I've seen, this code doesn't.  This looks like the
  cause of the behavior Matthias saw, though it probably won't happen
  if there isn't also something wrong with his send script.  The
  NATURE of the bug is that it causes Mutt to misreport to the user
  what happened.
  
  Fixing this almost certainly won't fix Matthias' problem, but may at
  least help make it clear why it's happening.  There's also a bit of
  trickiness in what to do about some of the cases...  But in this
  instance, basically Mutt should report to the user that the process
  was killed by a signal, what the signal was, and indicate that it
  can't know if sending the message was actually successful.

  Trouble is, the way send_msg() forks twice, it provides no good
  means for communicating the full status back to the parent Mutt
  process.  When you call exit(status), The OS hands the least
  significant bits of status to you, plus a bunch of extra bits.  If
  an intermediate process gets back an exit status, and then calls
  exit with that same exit status, a bunch of important information is
  lost.  Mutt could provide a different means of communicating this
  status back down the process tree to the parent Mutt process, but...
  doesn't.

So... What I would do, if I were going to fix this is:

 - Get rid of SigChld and the chld_handler, and remove all instances
   of blocking or ignoring SIGCHLD.  Leave SIGCHLD set to SIG_DFL.
 
 - Fix all calls to waid or waitpid to look for at least 3 cases:
   + WIFEXITED (exited "normally" with some exit status)
   + WIFSIGNALED (killed by a signal, WSTOPSIG() gives the sigal)
   + "else" (stopped, continued, any OS-specific cases).  You could
     split these out to separate cases, but prolly not useful.

   These cases aren't the same and shouldn't overlap, so not checking
   for them produces undefined behavior in Mutt.

   Some calls to waitpid are not even checking their return status.
 
 - drop the extra fork in send_msg() and/or implement a different
   means of communicating the FULL exit status (not just the exit
   code) so you can use the W* macros on it
 
 - It may make sense that sometimes, you can't do much of anything
   with the status returned by waitpid().  Mutt already has a
   mechanism implemented to save those for later, but many of the
   calls are not even using that.  Any time this happens, Mutt is
   throwing away status that the user NEEDS in order to make an
   informed decision about how to handle the exception.  Though,
   mostly I think if you can't use it where it was obtained, the code
   flow needs a redesign.


-- 
Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.  Replying to it will result in
undeliverable mail due to spam prevention.  Sorry for the inconvenience.

Attachment: signature.asc
Description: PGP signature

Reply via email to