Ending/reducing bytecode compilation, loosening dependencies

2005-12-30 Thread Joe Wreschnig
Hi,

About a month ago Steve Langasek and I discussed the state of Python
packages on IRC, in particular the effects of bytecode compilation; the
effectiveness (or lack thereof) of it, and how it tightens Python
dependencies. I'd like to propose three changes to how Python modules
are handled.

All three can be summarized as: Python should not compile stuff by
default; this is premature optimization and a waste of time, disk space,
and doesn't solve the problems anyway.

1. Stop compiling .pyo files, entirely (I'm hoping for little argument
on this).

Rationale: .pyo files are a joke. They aren't optimized in any
meaningful sense, they just have asserts removed. Examples for several
non-trivial files:

$ md5sum stock.pyc stock.pyo widgets.pyc widgets.pyo formats/_audio.pyc 
formats/_audio.pyo
5ca1a79bf036e9eddf97028c00f1d0c7  stock.pyc
5ca1a79bf036e9eddf97028c00f1d0c7  stock.pyo
f6c17acdf8043bb8524834f9a5f5c747  widgets.pyc
f6c17acdf8043bb8524834f9a5f5c747  widgets.pyo
dea672e99bb57f7e7585378886eb3cb0  formats/_audio.pyc
dea672e99bb57f7e7585378886eb3cb0  formats/_audio.pyo

They also aren't even loaded unless you run python with -O, which I
don't think any Python programs in Debian do.

How?: compileall.py:57,
-cfile = fullname + (__debug__ and 'c' or 'o')
+cfile = fullname + 'c'

2. Stop compiling .pyc files (this I expect to be contentious), unless a
package wants to.

Rationale: .pyc files have a minimal gain, and numerous failings.

Advantages of .pyc files:
* .pyc files make Python imports go marginally faster. However,
   for nontrivial Python programs, the import time is dwarfed
   by other startup code. Some quick benchmarks show about 20% gains
   for importing a .pyc over a .py. But even then, the wall-clock time
   is on the order of 0.5 seconds. Lars Wirzenius mentioned that
   this time matters for enemies-of-carlotta, and it probably also
   matters for some CGI scripts.

* Generating them at compile-time means they won't accidentally
  get generated some other time.

Disadvantages:
* They waste disk space; they use about as much as the code itself.

* It's still far too easy for modules to be regenerated for the
   wrong version of Python; just run the program as root.

* .pyc files are not really architecture independent. The integer
   constant 4294967296 will be a long in .pyc files compiled on 32 bit
   architectures, and an int when compiled on 64 bit architectures.
   The resulting module will run on both architectures, but won't
   behave in the same way as a module from that machine. To be fair,
   I don't know of any real-world examples that will break because
   of this.

* .pyc files result in strange bugs if they are not cleaned up
   properly, since Python will import them regardless of whether
   or not an equivalent .py is present.

* If we don't care about byte-compilation, the multi-version
   support suggested in 2.2.3 section 2 becomes much easier --
   just add that directory to sys.path (or use the existing
   unversioned /usr/lib/site-python). .pyc files are the rationale
   between tight dependencies on Python versions, which is the last
   of my suggested changes.

Another note: Currently, Python policy is based around the assumption
that .pyc files are valid within a single minor Python revision. I don't
find any evidence to support this in the Python documentation. In fact,
the marshal module documentation specifically says there are no such
guarantees. However, I don't think this has ever been a problem in
practice (if it was, we wouldn't notice, because Python just ignores
invalid pyc files).

How?: dh_python should not call compileall.py unless give some special
flag. Python policy 2.5 should change "should be generated" to "may be
generated." On the other hand, the removal code should be a "must" to
avoid littering the filesystem if .pyc files do get accidentally
generated.

I'm willing to write the patch for dh_python if there's agreement on
this.

The Python standard library should still compile .pyc files, because
this is a prerequisite for any program to make good use of .pyc files.
The problems don't apply here, because it's easy to keep the interpreter
and standard library in sync. 

3. Python dependencies should be loosened (and here I expect a
flamewar).

Rational: Python migrations in Debian suck, period. One reason for this
is that every Python program and module has a strict dependency on
python >> 2.x, << 2.x+1, so during a Python migration absolutely
everything must be rebuilt. But most pure-Python programs and modules
are upward-compatible, especially these days when Debian is a minor
version behind.

Tools like dh_python do make this easier, by making backporting (or
sideporting to e.g. Ubuntu) simply a rebuild. But why bother with even
that, when it's not necessary?

Without .pyc files, there's no reason for this tight dependency at all.
Even if we keep .pyc files, I think loosening this requirement is a good
idea.

Re: Ending/reducing bytecode compilation, loosening dependencies

2005-12-30 Thread Kenneth Pronovici
> About a month ago Steve Langasek and I discussed the state of Python
> packages on IRC, in particular the effects of bytecode compilation; the
> effectiveness (or lack thereof) of it, and how it tightens Python
> dependencies. I'd like to propose three changes to how Python modules
> are handled.

I have a couple of questions.

Do you intend this proposal to apply to Python libraries, or Python
applications, or both?  

I'm thinking that many applications (especially ones like EoC) would be
built for just a single version of Python anyway.  In this case, why
would it matter whether we have pre-compiled bytecode around?

What would you suggest doing about "hybrid" packages which are primarily
applications, but also want to make their modules available to other
Python programs?  Two examples here are pychecker and epydoc (both
maintained by me).  Right now, for those packages, I stick the .py, .pyc
and .pyo files in site-python, compile the modules for the default
version of Python, and then live with the inefficiency of recompiling
them for non-default Python versions and/or the possibility that root
will recompile them for the wrong version.  This isn't a great solution,
but it works.  However, some folks don't seem to like this solution very
much, and I've definitely gotten some pushback about the structure of
these packages.

Finally, what do you suggest doing with packages that contain both
pure-Python modules and C extensions?  It seems that any package which
contains a C extension is necessarily tied to a specific version of
Python, and so might as well have pre-compiled modules.  You didn't seem
to address this in your proposal, but maybe that's because you're
assuming that current policy is appropriate.

Thanks,

KEN

-- 
Kenneth J. Pronovici <[EMAIL PROTECTED]>


signature.asc
Description: Digital signature