Stripping Sage Binaries II -------------------------- With hardlinking multible files and stripping executables a size reduction of 438 MB (-26%) was achieved. Further reduction involves moving directories which breaks sage -testall. The goal is to produce a binary package of sage with aedequate functionality and reduced size. So I couldn't resist to push on and accept failing tests to see the overall potential.
Preliminary results: ----------------------------------------------- Original Directory tree (~1900 MB) sage-binary (775 MB / squashed FS 218 MB) That is: -60% / - 88 % compared to original size !!! sage-dev (529 MB) sage-doc (222 MB) The resulting binaries seem to work in a superficial test. The stripped binaries can be tested as a live iso or in a virtual Machine image (also easy frugal install into existing Linux desktop possible). Download iso image (400 MB, thats Live CD Base distro + stripped binaries) http://boxen.math.washington.edu/home/emil/sagelithe/ Stripping Procedere: -------------------- It started with building a binary distribution on sagelive-511-46-r3 (Live CD release) - 1) (see Footnote). The resulting directory tree of this build was manually split into 3 directories: sage-binary sage-devel sage-doc The bulk which was moved out of the original binary tree were the following directories 2): SAGE_ROOT/devel/sage (190 MB) SAGE_ROOT/devel/sage-main/build/lib.linux-i686-2.6/sage (78 MB) SAGE_ROOT/devel/sage-main/build/temp.linux-i686-2.6/sage (104 MB) SAGE_ROOT/devel/sagenb-main/dist(14 MB) SAGE_ROOT/devel/sagenb-main/build/lib/sagenb(35 MB) hidden directories: SAGE_ROOT/devel/sage-main/.hg (50 MB) SAGE_ROOT/devel/sage-main/.hg (39 MB) After that the stripping procedere from the 1. attempt was applied (hardlink multiplicate file instances, strip binaries) to the binary directory 3) . The produced binary package worked for me in a brief test 4) in a fresh install of the base distribution. However sage -testall is not working anymore, so it is not easy to give confirmation about which parts of sage might be broken. Tracebacks seemed to work, because all the python source code stayed in the remaining directories. To investigate further possibilities for reduction I also checked source sizes still available in the binary tree 4): Total file size of Python source is: 83506845 Bytes Total file size of lisp source is 14091345 Bytes Total file size of C source is 5444780 Bytes Total file size of C++ source is 163105 Bytes Total file size of C headers is 3779884 Bytes -------------------- Total size of source code found: 106985959 Bytes So removing the sourcefiles would gain another 100 MB. As I understand, ability for traceback at errors would be lost. But right at the moment I fear that it will break sage completely. Another aspect: There are lots of comments. An educated guess about the size of the comments in python code is about 40 MB. This estimation includes preservation of the original line-numbering, so tracebacks would yield the right line numbers. If one assumes that c code could also be shifted out then this would mean a reduction of 50 MB is possible ( I dont know if it is possible to shift maximas lisp code out). Regarding binaries, there would be the possibility to use upx compression. In the Live CD this is not needed, because files are already in a squashed FS. But for distributions which use uncompressed Filesystems this could give further substantial reduction. There was no prior knowledge of the structure of the sage package. So it might be possible that the split is not correct and some essential files are missing in the binaries. There is also the possibility that many files and directories could still be omitted in the binary package and shifted to one of the others. For further work I would be grateful for any input regarding the following: Test of the binaries, suggestions how to implement a working "sage - testall" for similar binaries? Feedback and input about the quality of the split, which files and directories were missed, or are wrong now? Information about the doc-tree. Which files are responsible to make the ? command in the CLI work? Test of the abilities for development. How does it behave if the development package is loaded? Can --strip-unneeded binaries be used for developement? (otherways it would be possible to fall back to -- strip-debug for libraries). Summary ------- A substantial reduction of the size of sage binaries was achieved using a combined approach of manual splitting, hardlinking double files and striping executables. The binary package was reduced to a size of 792 MB compared to a size of over 1900 MB of the original directory tree. This is a reduction of 60%. Size reduction in the squashed package was from 438 MB to 222 MB (-49 %). "sage -testall" does not work any more in the reduced binary, so there is further testing needed to confirm the functionality of the created binary package. Footnotes: --------- 1) !#/bin/sh # build sage binaries for sagelive, be sure that Tcltk is installed export SAGE_MATPLOTLIB_GUI="yes" export SAGE_FAT_BINARY="yes" make ./sage -bdist sagelive-511-4.6.1-r4-fat comment: In my opinion it is important, that as many features of Sage Components are available as possible. There is access to plotting from R and pylab (TCL backend) in a standard way. It was not possibel to integrate other matplotlib-backends until now, I would wonder how much they would add to the total size? Are there any additional environment variables that should be set to generate the binaries? The idea is to use the sage Components and libraries as core of the distribution and to integrate it tightly. What do other components need (e.g. maxima) to "work out of a box"? 2) Textfiles with du -ch of the packages are available here: http://boxen.math.washington.edu/home/emil/sagelithe The doc and dev package can be loaded as packages into the live version. (comming soon ...) 3) This is the procedure to hardlink multi-file instances and strip binaries #!/bin/sh # script to reduce size of directory tree and binaries, uses the package fslint (http://www.pixelbeat.org/fslint/) # be sure to have the scripts of fslint in your path, or edit line 6 so that findup is found. cd SAGE_ROOT # replace double files with hardlinks findup -m . # strip executables find . | xargs file | grep "executable" | grep ELF | cut -f 1 -d : | xargs strip --strip-unneeded 2> /dev/null # Level 1 stripping for shared libraries (comment/uncomment to switch) # find . | xargs file | grep "shared object" | grep ELF | cut -f 1 - d : | xargs strip --strip-debug 2> /dev/null # Level 2 stripping for shared libraries (comment/uncomment to switch) find . | xargs file | grep "shared object" | grep ELF | cut -f 1 -d : | xargs strip --strip-unneeded 2> /dev/null 4) sage starts up ok in console and in the notebook. some quick plotting and easy equation solving works in the notebook without flaws. sage -sh R demo(graphics) works, produces R demo plottings. sage -python from pylab import * plot ([1,2],[2,1]) show() produced a plot (I compiled with TclTk and have this dependency included in sagelive) built in help (doctstrings) doesn't work in console!, i.e plot ? gives just a short description and then Docstring: < no docstring > same command in the notebook works well. 5) just a quick copy paste hack: #!/bin/sh # calculates size of source files in directory tree tsum=0 sum=0 # check python for k in `find -name *.py -exec ls -l {} \+ | awk '{print $5}'` do sum=$((sum+k)) done echo "Total file size of Python source is: $sum Bytes" tsum=$((tsum+sum)) sum=0 # check lisp for k in `find -name *.lisp -exec ls -l {} \+ | awk '{print $5}'` ...SNIP etc ... -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org