On Wed, Feb 8, 2017 at 10:44 AM, Erik Bray <erik.m.b...@gmail.com> wrote:
> On Wed, Feb 8, 2017 at 10:36 AM, Erik Bray <erik.m.b...@gmail.com> wrote:
>> On Tue, Feb 7, 2017 at 6:49 PM, Jeroen Demeyer <jdeme...@cage.ugent.be> 
>> wrote:
>>> On 2017-02-07 17:30, Erik Bray wrote:
>>>>
>>>> A problem I've been having lately when running Sage's test suite on
>>>> Cygwin (i.e. sage -t -a).
>>>>
>>>> Several of the tests that use Maxima are spinning up Maxima processes
>>>> (I guess interacted with via pexpect?) and not killing them.
>>>
>>>
>>> This is probably Cygwin-specific. It would help if you could give some more
>>> details. For example: is the problem reproducible or does it only happen
>>> sometimes? Do you know which files cause the problem? Do the doctests
>>> actually pass? Does Cygwin have something like strace which might help to
>>> debug this?
>>
>> Yes, almost certainly Cygwin-specific.  Though I'm not sure when it
>> started--this didn't happen when I was running the tests a few months
>> ago.
>>
>> It's reproducible insofar as every time I run the full test suite it
>> happens.  I haven't pinpointed any specific tests that cause the
>> problem--that's mainly what I was asking for help with. I.e. what are
>> some tests that use Maxima?
>
> To answer this question for myself--as the discussion on what Maxima
> is used for in Sage pointed me in the right direction--the
> sage/calculus tests reliably start up at least 3 maxima processes,
> which then run away with my CPU even after the those tests are
> finished.  I'll see if I can see what exactly they are doing.

I've gained a little insight into the problem.  On one hand, I would
say there are some bugs in ecl, but on the other hand it can't be
entirely blamed as we're veering into the territory of undefined
behavior here.

The TL;DR version is that when `maxima.quit()` (or something similar)
is called, `SagePtyProcess.close()` calls `self.fileobj.close()`.
This closes the file for the master pty from the forkpty that started
the child process, resulting in an (unhandled, afaict) SIGHUP, and
subsequently broken stdio streams.  *How* exactly they are broken
though seems to be platform dependent, leading to different behaviors
(some of which I think is buggy).  In turn, there are some buglets in
ECL's error handling on both Cygwin *and* Linux.  The bugs on Linux
happen to be a bit nicer so it allows Maxima to exit quickly.  The bug
on Cygwin, on the other hand, sense it into an infinite loop of
select() calls.  Even though Sage tries to kill the process, this loop
is such a CPU drain that once you get 2 or 3 of them going
simultaneously it bogs down the system.

Then the pty is closed, if maxima's REPL is waiting for user input,
it's in a blocking read() on stdin.  This read exits with an error
status, triggering an exception in ECL, which drops into the LISP
debugger.  On the way though, it passes through Maxima's custom debug
handler, which prints a message on how to disable Maxima's debug
handler, then passes execution back to Maxima's REPL.

On the way, in the course of printing that message there are some
intermediate steps, but ultimately it goes into this function:

https://gitlab.com/embeddable-common-lisp/ecl/blob/310b51b677aef80f39bdd784e958b5727bcf8c5e/src/c/file.d#L3347

That function calls an fwrite() for one character and uses the return
value of fwrite() to determine if a write error occurred.  This,
however, is not reliable.  On Linux, in fact, fwrite() is returning 1
here, though errno gets set to 5 (EIO).  I'm inclined to call that a
bug in my glibc (2.19, FWIW), but it's not clear exactly what the
behavior should be.  ISTM calling ferror() here instead is the only
truly reliable way to determine if an error occurred in fwrite().  So
I think this is "bug" 1 in ECL (and possibly glibc).

The fact that an error isn't detected here on Linux is good news.  It
finishes "printing" (with errors) Maxima's debug message, then returns
to the input prompt.  It then immediately reads an EOF and exits.  No
problem.

On Cygwin, however, this is where everything blows up.  On Cygwin, the
error on fwrite() *is* detected.  This results in recursively looping
back into ECL's error handler.  Fortunately it has a mechanism to
prevent re-entering a custom debug handler if an error occurred in
that debug handler, so it doesn't re-enter Maxima's debug handler and
instead skips straight to ECL's default debugger.  One of the first
this this does is to call a function called (clear-input), which is
meant to clear any pending input waiting on stdin.  The implementation
for this (for a stdio stream) is here:

https://gitlab.com/embeddable-common-lisp/ecl/blob/310b51b677aef80f39bdd784e958b5727bcf8c5e/src/c/file.d#L3401

It goes into this dastardly loop which first calls a function call
flisten() which typically calls select() on stdin to check if it's
ready for reading.  Then it calls getc(stdin) and throws away the
result.  It doesn't check the return value of getc() for an error
condition.  In flisten() it *does* check if the file is at EOF.  On
Linux, if it gets to this point (after fixing the first bug),
feof(stdin) returns 1.  On Cygwin, on the other hand, it does not set
EOF on stdin, but ferror(stdin) does return 1 (in other words, it
doesn't treat the stream as at EOF, though it is in an error
condition).  I think this is a bug in Cygwin, but that's also a big
unclear.  According to [1]:

"If a read error occurs, the error indicator for the stream shall be
set, fgetc() shall return EOF, and shall set errno to indicate the
error."

It does not explicitly say that the stream's end-of-file indicator
should be set, even though the function returns EOF.  Linux is setting
the end-of-file indicator even on error, while Cygwin is not.  I think
it would be better if it did, though I don't think it's strictly wrong
that it doesn't.

Anyhow, because this loop doesn't check for errors, it goes on
infinitely and busily until the process is killed.  Adding an ferror()
check in the flisten() function fixes it.

Aside from the fixes directly to ECL, I think this can be worked
around in Sage by not explicitly closing the master pty until the
process has exited.  This could be done by modifying
SagePtyProcess.terminate_async to accept a callback function to be
called after the child process is terminated, by handling SIGCHLD.
(Among other possibilities).

Erik

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/fgetc.html

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

Reply via email to