On 2/23/11 3:06 PM, Robert Bradshaw wrote:
On Wed, Feb 23, 2011 at 11:34 AM, William Stein<wst...@gmail.com>  wrote:
On Wed, Feb 23, 2011 at 10:57 AM, Jason Grout
<jason-s...@creativetrax.com>  wrote:
On 2/23/11 12:28 PM, William Stein wrote:

At lunch yesterday Robert Bradshaw made the interesting suggestion to
read the docs for importlib
(http://docs.python.org/dev/library/importlib.html) and write a
customized import hook, so that every time during Sage startup that a
module is imported, the import is done from a single big in-memory zip
file instead of done using the filesystem.    If this can be made to
work, it would be a huge win for slow filesystems.   The basic problem
is that some filesystems are fast but have huge*latency*.

Is it a big win primarily because the zip file contents can be read in and
cached by us?  I'm just trying to understand it better.

Which would you rather do on a high latency filesystem:

  (1) Read/stat 20,000 little files, or
  (2) Read exactly one 40MB file.

  Is this the same idea as Jar files in java?

I don't know.

Yep. In that case the "high latency file system" was a webserver.

You mean like http://docs.python.org/library/zipimport.html ?

Cool.

Note that this should just involve putting the zip file first in the
python path.

I don't know for a fact that Robert Bradshaw's suggestion will be a
big win, since nobody has tried this yet.  But I'm optimistic.  The
idea would be to make a zip archive of
$SAGE_ROOT/local/lib/python/site-packages (say), and do *all* imports
using that massive zip archive.

I'm optimistic too. This would, of course, make more sense for
system-wide installs than development versions, but the former are
more likely to be on a non-local filesystem anyways.


Sounds like it is time for a trial!

I created a directory of 2000 .py files and an __init__.py file to make it a module

for i in range(2000):
    with open('importtest/test_%s.py'%i,'w') as f:
        f.write("VALUE=%s\n"%i)
with open('importtest/__init__.py','w') as f:
    f.write(' ')

Then I imported each of these so that .pyc files were created.

for i in range(2000):
    exec 'import importtest.test_%s'%i


Okay, then I copied the directory and zipped it up (in the shell now):

$ cp -r importtest zipimporttest
$ zip -r tmp.zip zipimporttest
$ rm -rf zipimporttest

One nice side effect is that the zip file is less than one MB, while the directory of python files is around 16M.

Now for the test. Here are my two scripts. One imports each module in the directory and adds up the VALUE in each module:

% cat mytest.py
s=0
for i in range(2000):
    exec 'import importtest.test_%s as tt'%i
    s+=tt.VALUE
print s


The other first adds the zip to the front of sys.path and then does the same imports and summing, but using the zipped module:

% cat mytestzip.py
import sys
sys.path.insert(0,'./tmp.zip')
s=0
for i in range(2000):
    exec 'import zipimporttest.test_%s as tt'%i
    s+=tt.VALUE
print s


And now for the timings:

% time sage -python mytest.py
Detected SAGE64 flag
Building Sage on OS X in 64-bit mode
1999000
sage -python mytest.py  0.26s user 1.47s system 75% cpu 2.282 total


% time sage -python mytestzip.py
Detected SAGE64 flag
Building Sage on OS X in 64-bit mode
1999000
sage -python mytestzip.py  0.21s user 0.11s system 99% cpu 0.327 total


It looks like the zip is a clear winner in this case. And this is with the directory presumably in the FS cache.

Thanks,

Jason

--
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to