On Sun, Aug 23, 2009 at 9:17 AM, Brian Granger <ellisonbg....@gmail.com> wrote: > >> In the current architecture, a twistd daemon spawns a notebook server >> which is responsible for doing "sage" stuff. twistd is fully >> asynchronous, but the notebook process itself is a pexpect based >> blocking process connected with pipes to twistd. As such, the block >> on read by pexpect precludes the sage process servicing asynchronous >> events. >> >> IMHO, this architecture is incorrect and limited... Perhaps this is >> part of what is being rethought... if not, I believe it should be. > > As an avid Twisted user, I too thought this initially (why use pexpect, when > you could use Twisted). But after looking at this issue further, I think > using pexpect is not that bad. Here is why: > > 1. If you were to use Twisted, while the process was running user's code, > Twisted would still block. Using threads (running the Twisted event loop in > a thread) only partially solves this problem as the python intepreter can't > switch threads while no GIL-releasing C/C++ code is running. We ran into > this in early versions of IPython's parallel stuff - it worked great (asynch) > until the second we went to do something like diagonalize a matrix using > scipy. Then everything would block. We have had to work very hard to get > around this GIL induced limitation of using Twisted. > > 2. Both dsage and parallel ipython clients use Twisted. For this to work, > these clients need to run the Twisted reactor in a different thread than user > code is executed. Currently, these work fine in the notebook, because they > can start the reactor in this way by themselves. If the notebook itself used > Twisted, great care would need to be used to make sure these things still > worked. You would have to run user code in the main thread and run all the > twisted stuff in a different thread. User code needs to be in the main > thread if you want users to be able to run real GUI code (I do this > sometimes!).
The Sage notebook is a lot like the command line tools bash or screen or even ssh. The pexpect library is just a collection of Python bindings to pseudotty that make it easy for one process to spawn and run subprocesses. Moreover, as long as the worksheet and the notebook server are distinct processes (as they should be, IMHO), the difference between using pexpect, or xmlrpc, or anything else, for them to communicate is completely and totally irrelevant, since it is a black box to the entire rest of the program. Also, to correct another possible misconception, communication between a processes and a subprocess using pexpect is not blocking. The master processes can listen for however long it wants to the subprocess, then stop listening. That's why when you do for i in range(10): sleep(1) print(i) in the Sage notebook, you see the output as it is computed. The notebook server just uses pexpect to "peak" at the output of the subprocess doing the actual work and look to see what has been output so far. Another misconception is that pexpect is restricted to local processes. It's easy to control a process via pexpect over the network via ssh. This has been in Sage since 2005, and can already be used for worksheet subprocesses *now* as long as you have a shared filesystem (just use the server_pool option). Here is an example on the command line. I have ssh keys setup so I can do "ssh sage.math.washington.edu" and login without typing a password. I start Sage on my laptop in a coffee shop, and make a connection to a remote Sage that gets started running on sage.math, and I run a calculation. flat:sageuse wstein$ sage ---------------------------------------------------------------------- | Sage Version 4.1.1, Release Date: 2009-08-14 | | Type notebook() for the GUI, and license() for information. | ---------------------------------------------------------------------- sage: s = Sage(server="sage.math.washington.edu") No remote temporary directory (option server_tmpdir) specified, using /tmp/ on sage.math.washington.edu sage: s.eval("2+2") '4' sage: s.eval("os.system('uname -a')") 'Linux sage.math.washington.edu 2.6.24-23-server #1 SMP Wed Apr 1 22:14:30 UTC 2009 x86_64 GNU/Linux\n0' sage: The above used pexpect. You can even interact with remote objects: sage: e = s("EllipticCurve([1..5])") sage: e.rank() 1 You can do the same with Mathematica, etc. by the way: sage: s = Mathematica(server="sage.math.washington.edu") sage: s("Factorial[50]") 30414093201713378043612608166064768844377641568960512000000000000 Compare my laptop to sage.math's mathematica: sage: s("Timing[Factorial[10^6]][[1]]") # sage.math 1.1099999999999999 sage: mathematica("Timing[Factorial[10^6]][[1]]") # laptop 0.8902620000000001 (I guess Mathematica 7.0 is faster at factorials than Mathematica 6.0.) This tests latency: sage: timeit('s.eval("2+2")') # over web via ssh 5 loops, best of 3: 56.3 ms per loop sage: timeit('mathematica.eval("2+2")') # local 625 loops, best of 3: 209 µs per loop Of course latency is long over the net, since I'm in a random coffee shop. This remote server stuff has been in sage since 2005, and hasn't been changed in the slightest bit since then. That's why I'm advertising it now, since it would be cool to see some people work on it and improve it. For example, for people without ssh keys, one could *easily* make it so the following works: sage: s = Mathematica(server="sage.math.washington.edu") password: xxx sage: s = Mathematica(server="w...@sage.math.washington.edu") password: xxx Scripted logins via pexpect are in fact the raison d'etre for pexpect in the first place, and would be easy to add. There are also bound to be all kinds of subtle issues with server=... that haven't been found due to lack of use. A good test would be to try to force the gap or maxima interfaces to run 100% remotely (by editing interfaces/gap.py or interface/maxima.py), then try to run the Sage test suite and see what goes wrong. With respect to the notebook, there is currently some reliance on a shared filesystem for the worksheet processes. This could be I think easily fixed via some slight redesign, and I'll do this in October. I could even make it so that there is an option for a given worksheet (set in say a worksheet configuration pane) for that worksheet to run as a given user on a given remote system. Then whenever you use that worksheet, you would have to login to the remote system to start it running, and afterwards all computations would happen using the default "sage" command on that remote system over ssh. I think implementing this would be completely straightforward given the current notebook design, and already this would provide a level of flexibility and power that rivals anything the codenode design or anybody else has suggested. In case the above wasn't clear, one could go to say https://sagenb.org, login, but then have persistent worksheet processes that run on sage.math.washington.edu, or any other powerful specific computer you have an account on. This would give you access to your own build of Sage, commercial software on that machine, etc. So there is still some potential to the pseudotty approach to controlling processes. The main drawback in my mind is that it works differently (and maybe not so well) on Windows (though it does actually work, but via the "Console API"). -- William --~--~---------~--~----~------------~-------~--~----~ To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---