I'm not sure if this discussion has been picked up in a different thread 
since February but the problem with zipimport seems to be that it can't load 
.so files. Since we have plenty of them, just loading the .py files from a 
zip file and the .so files from the file system might not be such a big 
improvement anymore.
To address the problem with high latency filesystems why don't we unzip a 
file containing all of site-packages/sage to some directory in /tmp and 
import everything from there?

julian

On Wednesday, February 23, 2011 10:47:14 PM UTC+1, jason wrote:
On 2/23/11 3:06 PM, Robert Bradshaw wrote:
> On Wed, Feb 23, 2011 at 11:34 AM, William Stein<wst...@gmail.com> wrote:
>> On Wed, Feb 23, 2011 at 10:57 AM, Jason Grout
>> <jason...@creativetrax.com> wrote:
>>> On 2/23/11 12:28 PM, William Stein wrote:
>>>>
>>>> At lunch yesterday Robert Bradshaw made the interesting suggestion to
>>>> read the docs for importlib
>>>> (http://docs.python.org/dev/library/importlib.html) and write a
>>>> customized import hook, so that every time during Sage startup that a
>>>> module is imported, the import is done from a single big in-memory zip
>>>> file instead of done using the filesystem. If this can be made to
>>>> work, it would be a huge win for slow filesystems. The basic problem
>>>> is that some filesystems are fast but have huge*latency*.
>>>
>>> Is it a big win primarily because the zip file contents can be read in 
and
>>> cached by us? I'm just trying to understand it better.
>>
>> Which would you rather do on a high latency filesystem:
>>
>> (1) Read/stat 20,000 little files, or
>> (2) Read exactly one 40MB file.
>>
>>> Is this the same idea as Jar files in java?
>>
>> I don't know.
>
> Yep. In that case the "high latency file system" was a webserver.
>
>>> You mean like http://docs.python.org/library/zipimport.html ?
>>
>> Cool.
>
> Note that this should just involve putting the zip file first in the
> python path.
>
>> I don't know for a fact that Robert Bradshaw's suggestion will be a
>> big win, since nobody has tried this yet. But I'm optimistic. The
>> idea would be to make a zip archive of
>> $SAGE_ROOT/local/lib/python/site-packages (say), and do *all* imports
>> using that massive zip archive.
>
> I'm optimistic too. This would, of course, make more sense for
> system-wide installs than development versions, but the former are
> more likely to be on a non-local filesystem anyways.


Sounds like it is time for a trial!

I created a directory of 2000 .py files and an __init__.py file to make 
it a module

for i in range(2000):
with open('importtest/test_%s.py'%i,'w') as f:
f.write("VALUE=%s\n"%i)
with open('importtest/__init__.py','w') as f:
f.write(' ')

Then I imported each of these so that .pyc files were created.

for i in range(2000):
exec 'import importtest.test_%s'%i


Okay, then I copied the directory and zipped it up (in the shell now):

$ cp -r importtest zipimporttest
$ zip -r tmp.zip zipimporttest
$ rm -rf zipimporttest

One nice side effect is that the zip file is less than one MB, while the 
directory of python files is around 16M.

Now for the test. Here are my two scripts. One imports each module in 
the directory and adds up the VALUE in each module:

% cat mytest.py
s=0
for i in range(2000):
exec 'import importtest.test_%s as tt'%i
s+=tt.VALUE
print s


The other first adds the zip to the front of sys.path and then does the 
same imports and summing, but using the zipped module:

% cat mytestzip.py
import sys
sys.path.insert(0,'./tmp.zip')
s=0
for i in range(2000):
exec 'import zipimporttest.test_%s as tt'%i
s+=tt.VALUE
print s


And now for the timings:

% time sage -python mytest.py
Detected SAGE64 flag
Building Sage on OS X in 64-bit mode
1999000
sage -python mytest.py 0.26s user 1.47s system 75% cpu 2.282 total


% time sage -python mytestzip.py
Detected SAGE64 flag
Building Sage on OS X in 64-bit mode
1999000
sage -python mytestzip.py 0.21s user 0.11s system 99% cpu 0.327 total


It looks like the zip is a clear winner in this case. And this is with 
the directory presumably in the FS cache.

Thanks,

Jason





































-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to