On 02Aug2015 17:41, Steven D'Aprano <st...@pearwood.info> wrote:
On Sun, 2 Aug 2015 01:53 pm, Cameron Simpson wrote:
Maybe this should be over in python-ideas, since there is a proposal down
the bottom of this message. But first the background...

I've just wasted a silly amount of time debugging an issue that really I
know about, but had forgotten.

:-)


I have a number of modules which include a main() function, and down the
bottom this code:

  if __name__ == '__main__':
    sys.exit(main(sys.argv))

so that I have a convenient command line tool if I invoke the module
directly. I typically have tiny shell wrappers like this:

  #!/bin/sh
  exec python -m cs.app.maildb -- ${1+"$@"}

TL;DR: pertinent discussion around my proposal is lower down. First I digress into Steven's shell query.

I know this isn't really relevant to your problem, but why use "exec python"
instead of just "python"?

Saves a process. Who needs a shell process just hanging around waiting? Think of this as tail recursion optimisation.

And can you explain the -- ${1+"$@"} bit for somebody who knows just enough
sh to know that it looks useful but not enough to know exactly what it
does?

Ah.

In a modern shell one can just write $@. I prefer portable code.

The more complicated version, which I use everywhere because it is portable, has to do with the behaviour of the $@ special variable. As you know, $* is the command line arguments as a single string, which is useless if you need to preserve them intact. "$@" is the command line arguments correctly quoted.

Unlike every other "$foo" variable, which produces a single string, "$@" produces all the command line arguments as separate strings. Critical for passing them correctly to other commands. HOWEVER, if there are no arguments then "$@" produces a single empty string. Not desired. It is either a very old bug or a deliberate decision that no "$foo" shall utterly vanish.

Thus this:

 ${1+"$@"}

Consulting your nearest "man sh" in the PARAMETER SUBSTITUION section you will see that this only inserts "$@" if there is at least one argument, avoiding the "$@" => "" with no arguments. It does this by only inserting "$@" if $1 is defined. Sneaky and reliable.

I believe in a modern shell a _bare_ $@ acts like a correctly behaving "$@" should have, but I always use the incantation above for portability.

In short, invoke this module as a main program, passing in the command
line arguments. Very useful.

My problem?

When invoked this way, the module cs.app.maildb that is being executed is
actually the module named "__main__".

Yep. Now, what you could do in cs.app.maildb is this:

# untested, but should work
if __name__ = '__main__':
   import sys
   sys.modules['cs.app.maildb'] = sys.modules[__name__]
   sys.exit(main(sys.argv))

Yes, but that is ghastly and complicated. And also relies on the boiler plate at the bottom knowing the module name.

*** but that's the wrong solution ***

It is suboptimal. "Wrong" seems a stretch.

The problem here is that by the time cs.app.maildb runs, some other part of
cs or cs.app may have already imported it. The trick of setting the module
object under both names can only work if you can guarantee to run this
before importing anything that does a circular import of cs.app.maildb.

That can be done if it takes place in the python interpreter. But there are side effects which need to be considered.

My initial objective is that:

 python -m cs.app.maildb

should import cs.app.maildb under the supplied name instead of "__main__" so that a recursive import did not instantiate a second module instance. That is, I think, a natural thing for users to expect from the above command line: "import cs.app.maildb, run its main program".

On further thought last night I devised the logic below to implement python's "-m" option:

 # pseudocode, with values hardwired for clarity
 import sys
 M = new_empty_module(name='__main__', qualname='cs.app.maildb')
 sys.modules['cs.app.maildb'] = M
 M.execfile('/path/to/cs/app/maildb.py')   # you know what I mean...

The "qualname" above is an idea I thought of last night to allow introspection to cope with '__main__' and 'cs.app.maildb' at the same time, somewhat like the .__qualname__ attribute of a function as recently added to the language; under this scheme a module would get a __name__ and a __qualname__, normally the same, but __name__ set to '__main__' for the "main program module situation.

This should sidestep any issues with recursive imports by having the module in place in sys.modules ahead of the running of its code.

The right existing solution is to avoid having the same module do
double-duty as both runnable script and importable module.

I disagree. Supporting this double duty is, to me, a highly desirable feature. This is, in fact, a primary purpose of the present standard boilerplate.

I _like_ that: a single file, short and succinct.

In a package,
that's easy. Here's your package structure:

cs
+-- __init__.py
+-- app
   +-- __init__.py
   +-- mailbd.py

and possibly others. Every module that you want to be a runnable script
becomes a submodule with a __main__.py file:

cs
+-- __init__.py
+-- __main__.py
+-- app
   +-- __init__.py
   +-- __main__.py
[...]

Yes, nicely separated, but massive structural overkill for simple things like single file modules.

and now you can call:

python -m cs
python -m cs.app
python -m cs.app.mailbd

as needed. The __main__.py files look like this:

if __name__ = '__main__':
   import cs.app.maildb
   sys.exit(cs.app.maildb.main(sys.argv))

or as appropriate.

Yes, it's a bit more work. If your package has 30 modules, and every one is
runnable, that's a lot more work. But if your package is that, um,
intricate, then perhaps it needs a redesign?

 [hg/css]fleet*> grep '__name__ == .__main__' cs/**/*.py|wc -l
       96

No, it is simply my personal kit. The design is ok for what it is. Pieces of it are slowly being published on PyPI as they become publishable (beta or better quality, proper distinfo metadata applied, checked to not import unpublished modules, not import gratuitous tissue paper modules, free of most debugging or off topic cruft, etc).

To be honest, the majority of those __main__ calls actually run the unit tests for that module, not a proper "main program". A better grep:

 [hg/css-nodedb]fleet*> grep 'main(sys.argv)' cs/**/*.py|wc -l
       14

says just 14. Far saner; those are modules/packages for which there really is an associated command line tool.

The major use-case for this feature is where you have a package, and you
want it to have a single entry point when running it as a script. (That
would be "python -m cs" in the example above.) But it can be used when you
have multiple entry points too.

For a single .py file, you can usually assume that when you are running it
as a stand alone script, there are no circular imports of itself:

# spam.py
import eggs
if __name__ == '__main__':
   main()

# eggs.py
import spam  # circular import

If that expectation is violated, then you can run into the trouble you
already did.

As described, that expectation was violated. In the normal course of affairs one rarely trips over it.

So...
* you can safely combine importable module and runnable script in
 the one file, provided the runnable script functionality doesn't
 depend on importing itself under the original name (either
 directly or indirectly);

* if you must violate that expectation, the safest solution is to
 make the module a package with a __main__.py file that contains
 the runnable script portion;

My proposal above is to solve this issue without requiring the breaking of a module into a multifile package just to address a counterintuitive edge case, and to avoid cognitive dissonance for Python users when they do traverse that edge case.

I want "python -m foo" to accomplish more closely what the naive user expects.

* if you don't wish to do that, you're screwed, and I think that the
 best you can do is program defensively by detecting the problem
 after the event and bailing out:

 # untested
 import __main__
 import myactualfilename
 if os.path.samefile(__main__.__path__, myactualfilename.__path__):
     raise RuntimeError

Nasty and defeatist! I rail against this mode of thought! :-)

Anyway, I'm about to raise my proposed implementation change higher up over on python-ideas with a plan to write a PEP if I don't get fundamental objections (i.e. "this breaks everything" versus your "you can work around it in these [cumbersome] ways").

Cheers,
Cameron Simpson <c...@zip.com.au>

"My manner of thinking, so you say, cannot be approved. Do you suppose I
care? A poor fool indeed is he who adopts a manner of thinking for others!
My manner of thinking stems straight from my considered  reflections; it
holds with my existence, with the way I am made. It is not in my power to
alter it; and were it, I'd not do so." Donatien Alphonse Francois de Sade
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to