Re: PEP 304 - is anyone really interested?

Patrick Maupin Fri, 24 Jun 2005 00:10:40 -0700

John Roth wrote:

>  I'd like to suggest a different mechanism, at least for packages
> (top level scripts don't generate .pyc files anyway.)  Put a system
> variable in the __init__.py file.  Something like __obj__ = path
> would do nicely. Then when Python created the __init__.pyc file,
> it would insert a back link __src__ entry.


I like the kernel of that proposal.  One down-side is that you might
wind up compiling at least __init__.py on every execution which imports
the package (to extract the object directory variable __obj__).   If
you had automatic .pyc directory creation and updating of __obj__ for
sub-packages, you would only take the hit of recompiling the topmost
__init__.py in a package tree.

I'm not as sure about the use of the backlink -- it seems to introduce
a little bit of a chicken/egg problem to be able to import from a .pyc
directory which knows where its .py directory is -- how did the .pyc
files get there in the first place?  There must have been a separate
compilation phase at some point, and while I can envision a use case
for that, I don't have a pressing need for it.

I'm also not sure about the extent of the changes required to the
import/compile mechanism.  It seems the importer would have  to be able
to defer writing out the .pyc file until after the execution of the
body of the __init__.py module.

Finally, I think you're dismissing one of the discussed use-cases out
of hand :)  Although it is true that a top-level script will not
generate a .pyc file, a slightly more generic use of the term "script"
could encompass a top-level module and one or more sub-modules in the
same directory.  If the script is run infrequently enough and the
sub-modules are small enough, in some cases I would certainly love to
be able to tell Python "Please don't litter this directory with .pyc
files."

Assuming the deferral of writing .pyc files until after module
execution is is not a problem (for all I know it already works this
way, but I wouldn't know why that would be), I think a slightly more
fleshed out (but still very green) proposal might be:

1) When a module is first imported, the importing module's globals are
searched for an __obj__ identifier.  The importer will make a local
copy of this variable during the import:

objdir = passed_globals.get('__obj__', None)

2)  If a .pyc/.pyo file is found in the same directory as the
coresponding .py file:
       a) If the .pyc/.pyo is newer, it is loaded and executed and we
are done; or
       b)  objdir is set to None to indicate that we should regenerate
.pyc/.pyo in situ.

Step b) could be debated, but I think it would be very confusing to
have an out-of-date .pyc file in a directory, with the "real" .pyc file
elsewhere...

3) If this is a package import and objdir is a non-null string, objdir
is updated to include the package name.  Something like:

if is_package and objdir: objdir = os.path.join(objdir, package_name)

4) If objdir is a non-null string and there is a newer readable
.pyc/.pyo file at the directory indicated in objdir, that file is
loaded and executed and we are done.

5) The source file is compiled into memory.

6) If objdir is not None the globals of the newly created module are
updated such that __obj__= objdir.

7) The module body is executed, including performing any sub-imports.

8)  The module's __obj__ is now examined to determine if and where to
write the module's .pyc/.pyo file:

if __obj__  does not exist -- write to same directory as .py (same as
current behavior)

if __obj__ exists and is a non-empty string and is equal to objdir
(e.g. not modified during initial module body execution) -- write to
the named directory.  Create the leaf directory if necessary (e.g. for
package imports).

if __obj__ exists and is the empty string, do not create a .pyc file.
This allows author suppression of writing .pyc files.

if __obj__ exists but is not equal to objdir, create the leaf directory
if it does not exist, but do not write the .pyc file.   This is an
optimization for the case where __init__.py specifies a new package
subdirectory -- why write out a .pyc file if you won't know where it is
until you re-compile/re-execute the source .py file?  You'll never be
able to make use of the .pyc file and maybe you don't even have write
privileges in the directory.   Even though this is true, we still want
to create the directory, so that sub-package imports don't have to
create multiple directory levels.

I think this mechanism would allow for the following:

- Control of package/subpackage .pyc file location with a single line
of code in the topmost __init__ file, at the small cost of recompiling
the __init__ file on every program execution.

- Control of _all_  .pyc file location (unless overridden for a given
package as described above) from a top-level script.  In this case,
regular .pyc files would wind up _inside_ the __obj__ directory of the
package which first imported them, and package .pyc files would wind up
in a subdirectory underneath the directory of the package which first
imported them.  This is not a major issue for scripts which follow
consistent import execution flow, but could be surprising to a
first-time user of the control mechanism, and could increase disk space
usage a tiny bit.

- Suppression of writing of .pyc files from a top-level script by
setting __obj__= ''.  Preexisting .pyc files would be used (and
overwritten if the corresponding .py is newer), but .pyc files would
not be generated if they did not already exist.

In addition, it might be possible to implement this mechanism to enable
generation of .pyc files for zip-imported packages (as with using this
mechanism with regular packages, the __init__.pyc file would not be
written out, but it could define where to look for the other .pyc
files).  This could potentially be a huge performance gain for some zip
imports.

> Actually, the PEP states that if the environment variable does
> not specify a directory then it does not generate a .pyc file.
> Any entry that is not a directory would do, such as some special
> characters that are illegal in file and directory names.

It is my understanding (the capabilities of bash and other shells
notwithstanding) that the only truly "illegal" filename characters
under Posix are the slash and the null character.  So, for a full path
(as opposed to a single name), the only illegal character would be the
null character, which is probably pretty hard/ugly to inject into an
environment variable from a shell under either Windows or Linux.  I
suppose you could try to usurp special combinations, such as "//",
except I think Microsoft has already done that :)  Even if you found a
combination that Microsoft didn't claim, I would personally find it
very unPythonic to document the required use of something like
"badname//".

In any case, if we are re-thinking the PEP, and we find a use-case for
something like your __obj__ proposal, but don't find such a good
use-case for the environment variable, I would recommend leaving the
environment variable out of the PEP, since its presence would add a
security concern for people who are running SUDO python scripts when
they upgrade to a version of Python which honors the environment
variable.

Regards,
Pat

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: PEP 304 - is anyone really interested?

Reply via email to