On 1 December 2015 at 22:36, Giuseppe Scrivano <gscri...@redhat.com> wrote: > Hi, > > I was experimenting with reducing the size of the Atomic Host image and > it seems that a lot of space is used by Python source files.
This sounds like an idea worth considering to me - we (wearing my CPython hat) fully support sourceless deployments, but I hadn't considered the fact that could be used to save space in container images. The main downside I see is that it's going to lost a lot of information from Python tracebacks, and, at least as far as I am aware, the tooling doesn't currently exist to stitch a sourceless traceback back together with a suitable source tree to get a readable traceback again. > I had to deal differently with the two versions as Python 3 handles > source-less distributions in a different way than Python 2. > > Python 2 simply loads the *.pyc file when the *.py file is missing. > > Python 3 requires an additional step, as it puts the precompiled version > under the __pycache__ directory, but it expects the file to be one level > upper when the source file is missing: > > /foo/__pycache__/test.cpython-34.pyo -> /foo/test.pyc The Python 3 compileall module supports this directly, by passing the "-b" option: https://docs.python.org/3/library/compileall.html#command-line-use > This patch reduces the used disk space by around 55 MB. > > Any comments? > diff --git a/treecompose-post.sh b/treecompose-post.sh > index 73b6573..39f1ba0 100755 > --- a/treecompose-post.sh > +++ b/treecompose-post.sh > @@ -8,3 +8,40 @@ find /usr/share/locale -mindepth 1 -maxdepth 1 -type d -not > -name "${KEEPLANG}" > localedef --list-archive | grep -a -v ^"${KEEPLANG}" | xargs localedef > --delete-from-archive > mv -f /usr/lib/locale/locale-archive /usr/lib/locale/locale-archive.tmpl > build-locale-archive > + > +# Compile all the files. > +find /usr/lib*/python2.* -type d -exec python2 -OO -m compileall -l {} + > +find /usr/lib*/python3.* -type d -exec python3 -OO -m compileall -l {} + If you pass "-b" in the second line, Python 2 & 3 should produce files in the same places. However, -OO strips docstrings, which can break some applications - this is why the Fedora Python packaging guidelines were recently updated to advise against using -OO for system packages. (Changes to the way optimisation levels are handled for Python 3.5+ mean that restriction only applied to 3.4 and earlier, but even for 3.5+, it would still be a problem when renaming the bytecode files to run even in the default __debug__ mode). > +# Here we treat Python 2 and Python 3 differently: > + > +# Python 2 > +# *.pyo files are basically *.pyc files, except that when the source > +# file is missing Python will load only the .pyc file: > +find /usr/lib*/python2.* -type f -name "*.pyo" | while read i > +do > + destination_pyc=$(echo $i | sed -e's|pyo$|pyc|') > + rm -f $destination_pyc > + mv $i $destination_pyc > +done This renaming would lead to -OO modules being loaded into __debug__ and -O processes, potentially causing problems for code expecting docstrings to be present. It should be OK if the files are compiled with the "-O" optimisation level, though. Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia