On Wed, Feb 8, 2017 at 10:44 AM, Erik Bray <erik.m.b...@gmail.com> wrote: > On Wed, Feb 8, 2017 at 10:36 AM, Erik Bray <erik.m.b...@gmail.com> wrote: >> On Tue, Feb 7, 2017 at 6:49 PM, Jeroen Demeyer <jdeme...@cage.ugent.be> >> wrote: >>> On 2017-02-07 17:30, Erik Bray wrote: >>>> >>>> A problem I've been having lately when running Sage's test suite on >>>> Cygwin (i.e. sage -t -a). >>>> >>>> Several of the tests that use Maxima are spinning up Maxima processes >>>> (I guess interacted with via pexpect?) and not killing them. >>> >>> >>> This is probably Cygwin-specific. It would help if you could give some more >>> details. For example: is the problem reproducible or does it only happen >>> sometimes? Do you know which files cause the problem? Do the doctests >>> actually pass? Does Cygwin have something like strace which might help to >>> debug this? >> >> Yes, almost certainly Cygwin-specific. Though I'm not sure when it >> started--this didn't happen when I was running the tests a few months >> ago. >> >> It's reproducible insofar as every time I run the full test suite it >> happens. I haven't pinpointed any specific tests that cause the >> problem--that's mainly what I was asking for help with. I.e. what are >> some tests that use Maxima? > > To answer this question for myself--as the discussion on what Maxima > is used for in Sage pointed me in the right direction--the > sage/calculus tests reliably start up at least 3 maxima processes, > which then run away with my CPU even after the those tests are > finished. I'll see if I can see what exactly they are doing.
I've gained a little insight into the problem. On one hand, I would say there are some bugs in ecl, but on the other hand it can't be entirely blamed as we're veering into the territory of undefined behavior here. The TL;DR version is that when `maxima.quit()` (or something similar) is called, `SagePtyProcess.close()` calls `self.fileobj.close()`. This closes the file for the master pty from the forkpty that started the child process, resulting in an (unhandled, afaict) SIGHUP, and subsequently broken stdio streams. *How* exactly they are broken though seems to be platform dependent, leading to different behaviors (some of which I think is buggy). In turn, there are some buglets in ECL's error handling on both Cygwin *and* Linux. The bugs on Linux happen to be a bit nicer so it allows Maxima to exit quickly. The bug on Cygwin, on the other hand, sense it into an infinite loop of select() calls. Even though Sage tries to kill the process, this loop is such a CPU drain that once you get 2 or 3 of them going simultaneously it bogs down the system. Then the pty is closed, if maxima's REPL is waiting for user input, it's in a blocking read() on stdin. This read exits with an error status, triggering an exception in ECL, which drops into the LISP debugger. On the way though, it passes through Maxima's custom debug handler, which prints a message on how to disable Maxima's debug handler, then passes execution back to Maxima's REPL. On the way, in the course of printing that message there are some intermediate steps, but ultimately it goes into this function: https://gitlab.com/embeddable-common-lisp/ecl/blob/310b51b677aef80f39bdd784e958b5727bcf8c5e/src/c/file.d#L3347 That function calls an fwrite() for one character and uses the return value of fwrite() to determine if a write error occurred. This, however, is not reliable. On Linux, in fact, fwrite() is returning 1 here, though errno gets set to 5 (EIO). I'm inclined to call that a bug in my glibc (2.19, FWIW), but it's not clear exactly what the behavior should be. ISTM calling ferror() here instead is the only truly reliable way to determine if an error occurred in fwrite(). So I think this is "bug" 1 in ECL (and possibly glibc). The fact that an error isn't detected here on Linux is good news. It finishes "printing" (with errors) Maxima's debug message, then returns to the input prompt. It then immediately reads an EOF and exits. No problem. On Cygwin, however, this is where everything blows up. On Cygwin, the error on fwrite() *is* detected. This results in recursively looping back into ECL's error handler. Fortunately it has a mechanism to prevent re-entering a custom debug handler if an error occurred in that debug handler, so it doesn't re-enter Maxima's debug handler and instead skips straight to ECL's default debugger. One of the first this this does is to call a function called (clear-input), which is meant to clear any pending input waiting on stdin. The implementation for this (for a stdio stream) is here: https://gitlab.com/embeddable-common-lisp/ecl/blob/310b51b677aef80f39bdd784e958b5727bcf8c5e/src/c/file.d#L3401 It goes into this dastardly loop which first calls a function call flisten() which typically calls select() on stdin to check if it's ready for reading. Then it calls getc(stdin) and throws away the result. It doesn't check the return value of getc() for an error condition. In flisten() it *does* check if the file is at EOF. On Linux, if it gets to this point (after fixing the first bug), feof(stdin) returns 1. On Cygwin, on the other hand, it does not set EOF on stdin, but ferror(stdin) does return 1 (in other words, it doesn't treat the stream as at EOF, though it is in an error condition). I think this is a bug in Cygwin, but that's also a big unclear. According to [1]: "If a read error occurs, the error indicator for the stream shall be set, fgetc() shall return EOF, and shall set errno to indicate the error." It does not explicitly say that the stream's end-of-file indicator should be set, even though the function returns EOF. Linux is setting the end-of-file indicator even on error, while Cygwin is not. I think it would be better if it did, though I don't think it's strictly wrong that it doesn't. Anyhow, because this loop doesn't check for errors, it goes on infinitely and busily until the process is killed. Adding an ferror() check in the flisten() function fixes it. Aside from the fixes directly to ECL, I think this can be worked around in Sage by not explicitly closing the master pty until the process has exited. This could be done by modifying SagePtyProcess.terminate_async to accept a callback function to be called after the child process is terminated, by handling SIGCHLD. (Among other possibilities). Erik [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/fgetc.html -- You received this message because you are subscribed to the Google Groups "sage-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at https://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.