On Tue, Mar 1, 2011 at 4:32 AM, Johan S. R. Nielsen <j.s.r.niel...@mat.dtu.dk> wrote: > On Mar 1, 10:13 am, Robert Bradshaw <rober...@math.washington.edu> > wrote: >> On Tue, Mar 1, 2011 at 12:48 AM, Johan S. R. Nielsen >> >> >> >> <j.s.r.niel...@mat.dtu.dk> wrote: >> > On Feb 23, 11:03 pm, Jason Grout <jason-s...@creativetrax.com> wrote: >> >> On 2/23/11 3:56 PM, Robert Bradshaw wrote: >> >> >> > On Wed, Feb 23, 2011 at 1:47 PM, Jason Grout >> >> > <jason-s...@creativetrax.com> wrote: >> >> >> On 2/23/11 3:06 PM, Robert Bradshaw wrote: >> >> >> >>> On Wed, Feb 23, 2011 at 11:34 AM, William Stein<wst...@gmail.com> >> >> >>> wrote: >> >> >> >>>> On Wed, Feb 23, 2011 at 10:57 AM, Jason Grout >> >> >>>> <jason-s...@creativetrax.com> wrote: >> >> >> >>>>> On 2/23/11 12:28 PM, William Stein wrote: >> >> >> >>>>>> At lunch yesterday Robert Bradshaw made the interesting suggestion >> >> >>>>>> to >> >> >>>>>> read the docs for importlib >> >> >>>>>> (http://docs.python.org/dev/library/importlib.html) and write a >> >> >>>>>> customized import hook, so that every time during Sage startup >> >> >>>>>> that a >> >> >>>>>> module is imported, the import is done from a single big in-memory >> >> >>>>>> zip >> >> >>>>>> file instead of done using the filesystem. If this can be made >> >> >>>>>> to >> >> >>>>>> work, it would be a huge win for slow filesystems. The basic >> >> >>>>>> problem >> >> >>>>>> is that some filesystems are fast but have huge*latency*. >> >> >> >>>>> Is it a big win primarily because the zip file contents can be read >> >> >>>>> in >> >> >>>>> and >> >> >>>>> cached by us? I'm just trying to understand it better. >> >> >> >>>> Which would you rather do on a high latency filesystem: >> >> >> >>>> (1) Read/stat 20,000 little files, or >> >> >>>> (2) Read exactly one 40MB file. >> >> >> >>>>> Is this the same idea as Jar files in java? >> >> >> >>>> I don't know. >> >> >> >>> Yep. In that case the "high latency file system" was a webserver. >> >> >> >>>>> You mean likehttp://docs.python.org/library/zipimport.html? >> >> >> >>>> Cool. >> >> >> >>> Note that this should just involve putting the zip file first in the >> >> >>> python path. >> >> >> >>>> I don't know for a fact that Robert Bradshaw's suggestion will be a >> >> >>>> big win, since nobody has tried this yet. But I'm optimistic. The >> >> >>>> idea would be to make a zip archive of >> >> >>>> $SAGE_ROOT/local/lib/python/site-packages (say), and do *all* imports >> >> >>>> using that massive zip archive. >> >> >> >>> I'm optimistic too. This would, of course, make more sense for >> >> >>> system-wide installs than development versions, but the former are >> >> >>> more likely to be on a non-local filesystem anyways. >> >> >> >> Sounds like it is time for a trial! >> >> >> >> I created a directory of 2000 .py files and an __init__.py file to >> >> >> make it a >> >> >> module >> >> >> >> for i in range(2000): >> >> >> with open('importtest/test_%s.py'%i,'w') as f: >> >> >> f.write("VALUE=%s\n"%i) >> >> >> with open('importtest/__init__.py','w') as f: >> >> >> f.write(' ') >> >> >> >> Then I imported each of these so that .pyc files were created. >> >> >> >> for i in range(2000): >> >> >> exec 'import importtest.test_%s'%i >> >> >> >> Okay, then I copied the directory and zipped it up (in the shell now): >> >> >> >> $ cp -r importtest zipimporttest >> >> >> $ zip -r tmp.zip zipimporttest >> >> >> $ rm -rf zipimporttest >> >> >> >> One nice side effect is that the zip file is less than one MB, while >> >> >> the >> >> >> directory of python files is around 16M. >> >> >> >> Now for the test. Here are my two scripts. One imports each module >> >> >> in the >> >> >> directory and adds up the VALUE in each module: >> >> >> >> % cat mytest.py >> >> >> s=0 >> >> >> for i in range(2000): >> >> >> exec 'import importtest.test_%s as tt'%i >> >> >> s+=tt.VALUE >> >> >> print s >> >> >> >> The other first adds the zip to the front of sys.path and then does >> >> >> the same >> >> >> imports and summing, but using the zipped module: >> >> >> >> % cat mytestzip.py >> >> >> import sys >> >> >> sys.path.insert(0,'./tmp.zip') >> >> >> s=0 >> >> >> for i in range(2000): >> >> >> exec 'import zipimporttest.test_%s as tt'%i >> >> >> s+=tt.VALUE >> >> >> print s >> >> >> >> And now for the timings: >> >> >> >> % time sage -python mytest.py >> >> >> Detected SAGE64 flag >> >> >> Building Sage on OS X in 64-bit mode >> >> >> 1999000 >> >> >> sage -python mytest.py 0.26s user 1.47s system 75% cpu 2.282 total >> >> >> >> % time sage -python mytestzip.py >> >> >> Detected SAGE64 flag >> >> >> Building Sage on OS X in 64-bit mode >> >> >> 1999000 >> >> >> sage -python mytestzip.py 0.21s user 0.11s system 99% cpu 0.327 total >> >> >> >> It looks like the zip is a clear winner in this case. And this is >> >> >> with the >> >> >> directory presumably in the FS cache. >> >> >> > Cool. Given the CPU was pegged at 99%, have you tried using an >> >> > uncompressed zip file? It'd have more data to read, but less to do >> >> > with it once it's read. >> >> >> In my case, using zip -0 (no compression) gives: >> >> >> % time sage -python mytestzip.py >> >> Detected SAGE64 flag >> >> Building Sage on OS X in 64-bit mode >> >> 1999000 >> >> sage -python mytestzip.py 0.20s user 0.10s system 99% cpu 0.309 total >> >> >> So just a slight savings. >> >> >> Jason >> >> > I had an orthorgonal thought, though I'm not sure it's completely >> > possible. Insted of actually loading the real functions/classes etc., >> > couldn't we fast-load (or generate) stub-versions of all these, which >> > when called would load and replace themselves with the real version >> > and then run it. I'm not completely sure it's possible with Python, >> > but Python is pretty flexible so perhaps there is a way; in >> > particular, I don't know how Python supports reflection for adding new >> > functions to the namespace dynamically. Also, the doc-strings and >> > search*-functions should also somehow be thought into it. >> > If it's possible, as far as I can see, the user would not notice this >> > (except for a minute overhead the first time a function was called), >> > and only the very small fraction of used modules would be loaded each >> > session. Furthermore, because the stub-functions were in the >> > namespace, tab-completion would still work. >> > The stub-versions could either come from auto-generated python-files >> > from when compiling Sage and loaded by the usual module-loader, or >> > perhaps by some Python-function which used a compile-time-generated >> > listing of all functions/classes etc. to create these wrapper- >> > functions at run-time and add them to the namespace. >> >> See lazy-import. Doing this for everything may incur significant >> delays the first time a function is called (rather than before the >> prompt) and there are issues with Sage being fragile about the order >> in which some modules are implemented, but yes, it's possible and >> largely implemented. >> >> - Robert > > Nice! I weren't aware of this module. When you get a good idea, > there's a good chance that someone else thought of it before ;-) I > like the fact that one can dynamically hack into an object's > namespace :-D However, lazy_import seems not to be used much (only 2-3 > places) currently in the Sage startup (or did I grep wrongly?). Was it > never the intention or is it due to the overhead?
No, it's because it barely got into Sage (well, the new and improved version at least). > Also, I didn't proofread the entire lazy_import code, but the > implementation seems to differ from my idea in a significant way: the > LazyImport object keeps wrapping the imported objects. My thought was > that the first time the imported object was accessed, it would replace > itself with the original in the global namespace, and then forward the > call. This way, there will be zero overhead in all later calls. It does if the original namespace is available. See the _get_object method. > I agree that it probably shouldn't be done for central functions and > modules, but if the lasting (after first call) overhead could be > completely removed, wouldn't it be a good idea to apply for more or > less _all_ satellite-modules? As long as we don't get to the point that the first call takes a large amount of time, yes. There's also the (unimplemented) idea of actively loading lazily imported objects in a background thread during idle time. - Robert -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org