Hi, About a month ago Steve Langasek and I discussed the state of Python packages on IRC, in particular the effects of bytecode compilation; the effectiveness (or lack thereof) of it, and how it tightens Python dependencies. I'd like to propose three changes to how Python modules are handled.
All three can be summarized as: Python should not compile stuff by default; this is premature optimization and a waste of time, disk space, and doesn't solve the problems anyway. 1. Stop compiling .pyo files, entirely (I'm hoping for little argument on this). Rationale: .pyo files are a joke. They aren't optimized in any meaningful sense, they just have asserts removed. Examples for several non-trivial files: $ md5sum stock.pyc stock.pyo widgets.pyc widgets.pyo formats/_audio.pyc formats/_audio.pyo 5ca1a79bf036e9eddf97028c00f1d0c7 stock.pyc 5ca1a79bf036e9eddf97028c00f1d0c7 stock.pyo f6c17acdf8043bb8524834f9a5f5c747 widgets.pyc f6c17acdf8043bb8524834f9a5f5c747 widgets.pyo dea672e99bb57f7e7585378886eb3cb0 formats/_audio.pyc dea672e99bb57f7e7585378886eb3cb0 formats/_audio.pyo They also aren't even loaded unless you run python with -O, which I don't think any Python programs in Debian do. How?: compileall.py:57, - cfile = fullname + (__debug__ and 'c' or 'o') + cfile = fullname + 'c' 2. Stop compiling .pyc files (this I expect to be contentious), unless a package wants to. Rationale: .pyc files have a minimal gain, and numerous failings. Advantages of .pyc files: * .pyc files make Python imports go marginally faster. However, for nontrivial Python programs, the import time is dwarfed by other startup code. Some quick benchmarks show about 20% gains for importing a .pyc over a .py. But even then, the wall-clock time is on the order of 0.5 seconds. Lars Wirzenius mentioned that this time matters for enemies-of-carlotta, and it probably also matters for some CGI scripts. * Generating them at compile-time means they won't accidentally get generated some other time. Disadvantages: * They waste disk space; they use about as much as the code itself. * It's still far too easy for modules to be regenerated for the wrong version of Python; just run the program as root. * .pyc files are not really architecture independent. The integer constant 4294967296 will be a long in .pyc files compiled on 32 bit architectures, and an int when compiled on 64 bit architectures. The resulting module will run on both architectures, but won't behave in the same way as a module from that machine. To be fair, I don't know of any real-world examples that will break because of this. * .pyc files result in strange bugs if they are not cleaned up properly, since Python will import them regardless of whether or not an equivalent .py is present. * If we don't care about byte-compilation, the multi-version support suggested in 2.2.3 section 2 becomes much easier -- just add that directory to sys.path (or use the existing unversioned /usr/lib/site-python). .pyc files are the rationale between tight dependencies on Python versions, which is the last of my suggested changes. Another note: Currently, Python policy is based around the assumption that .pyc files are valid within a single minor Python revision. I don't find any evidence to support this in the Python documentation. In fact, the marshal module documentation specifically says there are no such guarantees. However, I don't think this has ever been a problem in practice (if it was, we wouldn't notice, because Python just ignores invalid pyc files). How?: dh_python should not call compileall.py unless give some special flag. Python policy 2.5 should change "should be generated" to "may be generated." On the other hand, the removal code should be a "must" to avoid littering the filesystem if .pyc files do get accidentally generated. I'm willing to write the patch for dh_python if there's agreement on this. The Python standard library should still compile .pyc files, because this is a prerequisite for any program to make good use of .pyc files. The problems don't apply here, because it's easy to keep the interpreter and standard library in sync. 3. Python dependencies should be loosened (and here I expect a flamewar). Rational: Python migrations in Debian suck, period. One reason for this is that every Python program and module has a strict dependency on python >> 2.x, << 2.x+1, so during a Python migration absolutely everything must be rebuilt. But most pure-Python programs and modules are upward-compatible, especially these days when Debian is a minor version behind. Tools like dh_python do make this easier, by making backporting (or sideporting to e.g. Ubuntu) simply a rebuild. But why bother with even that, when it's not necessary? Without .pyc files, there's no reason for this tight dependency at all. Even if we keep .pyc files, I think loosening this requirement is a good idea. Programs will still run perfectly fine with mis-versioned .pyc files; the worst we'll see is some slightly longer startup times. How?: Strike the third paragraph from 3.1.1. This would also negate the fifth paragraph, which outlines a hypothetical overcomplicated solution to the same problem. -- Joe Wreschnig <[EMAIL PROTECTED]>
signature.asc
Description: This is a digitally signed message part