Stripping Sage Binaries II
--------------------------

With hardlinking multible files and stripping executables a size
reduction of 438 MB (-26%) was achieved. Further reduction involves
moving directories which breaks sage -testall. The goal is to produce
a binary package of sage with aedequate functionality and reduced
size.  So I couldn't resist to push on and accept failing tests to see
the overall potential.

Preliminary results:
-----------------------------------------------
Original Directory tree (~1900 MB)
sage-binary (775 MB / squashed FS 218 MB)
That is: -60% / - 88 % compared to original size !!!
sage-dev (529 MB)
sage-doc (222 MB)

The resulting binaries seem to work in a superficial test. The
stripped binaries can be tested as a live iso or in a virtual Machine
image (also easy frugal install into existing Linux desktop possible).

Download iso image (400 MB, thats Live CD Base distro + stripped
binaries)
http://boxen.math.washington.edu/home/emil/sagelithe/

Stripping Procedere:
--------------------
It started with building a binary distribution on sagelive-511-46-r3
(Live CD release) - 1) (see Footnote). The resulting directory tree of
this build was manually split into 3 directories:
sage-binary
sage-devel
sage-doc

The bulk which was moved out of the original binary tree were the
following directories 2):
SAGE_ROOT/devel/sage (190 MB)
SAGE_ROOT/devel/sage-main/build/lib.linux-i686-2.6/sage (78 MB)
SAGE_ROOT/devel/sage-main/build/temp.linux-i686-2.6/sage (104 MB)
SAGE_ROOT/devel/sagenb-main/dist(14 MB)
SAGE_ROOT/devel/sagenb-main/build/lib/sagenb(35 MB)
hidden directories:
SAGE_ROOT/devel/sage-main/.hg (50 MB)
SAGE_ROOT/devel/sage-main/.hg (39 MB)

After that the stripping procedere from the 1. attempt was applied
(hardlink multiplicate file instances, strip binaries) to the binary
directory 3) .

The produced binary package worked for me in a brief test 4) in a
fresh install of the base distribution. However sage -testall is not
working anymore, so it is not easy to give confirmation about which
parts of sage might be broken. Tracebacks seemed to work, because all
the python source code stayed in the remaining directories.

To investigate further possibilities for reduction I also checked
source sizes still available in the binary tree 4):

Total file size of Python source is: 83506845 Bytes
Total file size of lisp source is 14091345 Bytes
Total file size of C source is 5444780 Bytes
Total file size of C++ source is 163105 Bytes
Total file size of C headers is 3779884 Bytes
--------------------
Total size of source code found: 106985959 Bytes

So removing the sourcefiles would gain another 100 MB.
As I understand, ability for traceback at errors would be lost. But
right at the moment I fear that it will break sage completely. Another
aspect: There are lots of comments. An educated guess about the size
of the comments in python code is about 40 MB. This estimation
includes preservation of the original line-numbering, so tracebacks
would yield the right line numbers. If one assumes that c code could
also be shifted out then this would mean a reduction of 50 MB is
possible ( I dont know if it is possible to shift maximas lisp code
out).

Regarding binaries, there would be the possibility to use upx
compression. In the Live CD this is not needed, because files are
already in a squashed FS. But for distributions which use uncompressed
Filesystems this could give further substantial reduction.

There was no prior knowledge of the structure of the sage package. So
it might be possible that the split is not correct and some essential
files are missing in the binaries. There is also the possibility that
many files and directories could still be omitted in the binary
package and shifted to one of the others.

For further work I would be grateful for any input regarding the
following:
Test of the binaries, suggestions how to implement a working "sage -
testall" for similar binaries?
Feedback and input about the quality of the split, which files and
directories were missed, or are wrong now?
Information about the doc-tree. Which files are responsible to make
the ? command in the CLI work?
Test of the abilities for development. How does it behave if the
development package is loaded? Can --strip-unneeded binaries be used
for developement? (otherways it would be possible to fall back to --
strip-debug for libraries).

Summary
-------
A substantial reduction of the size of sage binaries was achieved
using a combined approach of manual splitting, hardlinking double
files and striping executables. The binary package was reduced to a
size of 792 MB compared to a size of over 1900 MB of the original
directory tree. This is a reduction of 60%. Size reduction in the
squashed package was from 438 MB to 222 MB (-49 %).  "sage -testall"
does not work any more in the reduced binary, so there is further
testing needed to confirm the functionality of the created binary
package.

Footnotes:
---------
1)
 !#/bin/sh
 # build sage binaries for sagelive, be sure that Tcltk is installed
 export SAGE_MATPLOTLIB_GUI="yes"
 export SAGE_FAT_BINARY="yes"
 make
 ./sage -bdist sagelive-511-4.6.1-r4-fat

comment:
In my opinion it is important, that as many features of Sage
Components are available as possible. There is access to plotting from
R and pylab (TCL backend) in a standard way. It was not possibel to
integrate other matplotlib-backends until now, I would wonder how much
they would add to the total size?

Are there any additional environment variables that should be set to
generate the binaries? The idea is to use the sage Components and
libraries as core of the distribution and to integrate it tightly.
What do other components need (e.g. maxima) to "work out of a box"?

2)
Textfiles with du -ch of the packages are available here:
http://boxen.math.washington.edu/home/emil/sagelithe
The doc and dev package can be loaded as packages into the live
version.
(comming soon ...)

3)
This is the procedure to hardlink multi-file instances and strip
binaries

#!/bin/sh
# script to reduce size of directory tree and binaries, uses the
package fslint (http://www.pixelbeat.org/fslint/)
# be sure to have the scripts of fslint in your path, or edit line 6
so that findup is found.
cd SAGE_ROOT
# replace double files with hardlinks
findup -m .
# strip executables
find . | xargs file | grep "executable" | grep ELF | cut -f 1 -d : |
xargs strip --strip-unneeded 2> /dev/null
# Level 1 stripping for shared libraries (comment/uncomment to switch)
# find . | xargs file | grep "shared object" | grep ELF | cut -f 1 -
d : | xargs strip --strip-debug 2> /dev/null
# Level 2 stripping for shared libraries (comment/uncomment to switch)
find . | xargs file | grep "shared object" | grep ELF | cut -f 1 -d :
| xargs strip --strip-unneeded 2> /dev/null

4)
sage starts up ok in console and in the notebook.
some quick plotting and easy equation solving works in the notebook
without flaws.

sage -sh
R
demo(graphics)
works, produces R demo plottings.

sage -python
from pylab import *
plot ([1,2],[2,1])
show()

produced a plot
(I compiled with TclTk and have this dependency included in sagelive)

built in help (doctstrings) doesn't work in console!, i.e plot ? gives
just a short description and then
Docstring:
< no docstring >

same command in the notebook works well.

5)
just a quick copy paste hack:

#!/bin/sh
# calculates size of source files in directory tree
tsum=0
sum=0
# check python
for k in `find -name *.py -exec ls -l {} \+ | awk '{print $5}'`
do
   sum=$((sum+k))
done
echo "Total file size of Python source is: $sum Bytes"
tsum=$((tsum+sum))
sum=0
# check lisp
for k in `find -name *.lisp -exec ls -l {} \+ | awk '{print $5}'`
...SNIP
etc ...

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to