John Roth wrote: > I'd like to suggest a different mechanism, at least for packages > (top level scripts don't generate .pyc files anyway.) Put a system > variable in the __init__.py file. Something like __obj__ = path > would do nicely. Then when Python created the __init__.pyc file, > it would insert a back link __src__ entry.
I like the kernel of that proposal. One down-side is that you might wind up compiling at least __init__.py on every execution which imports the package (to extract the object directory variable __obj__). If you had automatic .pyc directory creation and updating of __obj__ for sub-packages, you would only take the hit of recompiling the topmost __init__.py in a package tree. I'm not as sure about the use of the backlink -- it seems to introduce a little bit of a chicken/egg problem to be able to import from a .pyc directory which knows where its .py directory is -- how did the .pyc files get there in the first place? There must have been a separate compilation phase at some point, and while I can envision a use case for that, I don't have a pressing need for it. I'm also not sure about the extent of the changes required to the import/compile mechanism. It seems the importer would have to be able to defer writing out the .pyc file until after the execution of the body of the __init__.py module. Finally, I think you're dismissing one of the discussed use-cases out of hand :) Although it is true that a top-level script will not generate a .pyc file, a slightly more generic use of the term "script" could encompass a top-level module and one or more sub-modules in the same directory. If the script is run infrequently enough and the sub-modules are small enough, in some cases I would certainly love to be able to tell Python "Please don't litter this directory with .pyc files." Assuming the deferral of writing .pyc files until after module execution is is not a problem (for all I know it already works this way, but I wouldn't know why that would be), I think a slightly more fleshed out (but still very green) proposal might be: 1) When a module is first imported, the importing module's globals are searched for an __obj__ identifier. The importer will make a local copy of this variable during the import: objdir = passed_globals.get('__obj__', None) 2) If a .pyc/.pyo file is found in the same directory as the coresponding .py file: a) If the .pyc/.pyo is newer, it is loaded and executed and we are done; or b) objdir is set to None to indicate that we should regenerate .pyc/.pyo in situ. Step b) could be debated, but I think it would be very confusing to have an out-of-date .pyc file in a directory, with the "real" .pyc file elsewhere... 3) If this is a package import and objdir is a non-null string, objdir is updated to include the package name. Something like: if is_package and objdir: objdir = os.path.join(objdir, package_name) 4) If objdir is a non-null string and there is a newer readable .pyc/.pyo file at the directory indicated in objdir, that file is loaded and executed and we are done. 5) The source file is compiled into memory. 6) If objdir is not None the globals of the newly created module are updated such that __obj__= objdir. 7) The module body is executed, including performing any sub-imports. 8) The module's __obj__ is now examined to determine if and where to write the module's .pyc/.pyo file: if __obj__ does not exist -- write to same directory as .py (same as current behavior) if __obj__ exists and is a non-empty string and is equal to objdir (e.g. not modified during initial module body execution) -- write to the named directory. Create the leaf directory if necessary (e.g. for package imports). if __obj__ exists and is the empty string, do not create a .pyc file. This allows author suppression of writing .pyc files. if __obj__ exists but is not equal to objdir, create the leaf directory if it does not exist, but do not write the .pyc file. This is an optimization for the case where __init__.py specifies a new package subdirectory -- why write out a .pyc file if you won't know where it is until you re-compile/re-execute the source .py file? You'll never be able to make use of the .pyc file and maybe you don't even have write privileges in the directory. Even though this is true, we still want to create the directory, so that sub-package imports don't have to create multiple directory levels. I think this mechanism would allow for the following: - Control of package/subpackage .pyc file location with a single line of code in the topmost __init__ file, at the small cost of recompiling the __init__ file on every program execution. - Control of _all_ .pyc file location (unless overridden for a given package as described above) from a top-level script. In this case, regular .pyc files would wind up _inside_ the __obj__ directory of the package which first imported them, and package .pyc files would wind up in a subdirectory underneath the directory of the package which first imported them. This is not a major issue for scripts which follow consistent import execution flow, but could be surprising to a first-time user of the control mechanism, and could increase disk space usage a tiny bit. - Suppression of writing of .pyc files from a top-level script by setting __obj__= ''. Preexisting .pyc files would be used (and overwritten if the corresponding .py is newer), but .pyc files would not be generated if they did not already exist. In addition, it might be possible to implement this mechanism to enable generation of .pyc files for zip-imported packages (as with using this mechanism with regular packages, the __init__.pyc file would not be written out, but it could define where to look for the other .pyc files). This could potentially be a huge performance gain for some zip imports. > Actually, the PEP states that if the environment variable does > not specify a directory then it does not generate a .pyc file. > Any entry that is not a directory would do, such as some special > characters that are illegal in file and directory names. It is my understanding (the capabilities of bash and other shells notwithstanding) that the only truly "illegal" filename characters under Posix are the slash and the null character. So, for a full path (as opposed to a single name), the only illegal character would be the null character, which is probably pretty hard/ugly to inject into an environment variable from a shell under either Windows or Linux. I suppose you could try to usurp special combinations, such as "//", except I think Microsoft has already done that :) Even if you found a combination that Microsoft didn't claim, I would personally find it very unPythonic to document the required use of something like "badname//". In any case, if we are re-thinking the PEP, and we find a use-case for something like your __obj__ proposal, but don't find such a good use-case for the environment variable, I would recommend leaving the environment variable out of the PEP, since its presence would add a security concern for people who are running SUDO python scripts when they upgrade to a version of Python which honors the environment variable. Regards, Pat -- http://mail.python.org/mailman/listinfo/python-list