Bugs item #436259, was opened at 2001-06-25 23:17 Message generated for change (Comment added) made by akuchling You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=436259&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Distutils Group: Platform-specific Status: Open Resolution: None Priority: 5 Private: No Submitted By: Ben Hutchings (wom-work) Assigned to: Nobody/Anonymous (nobody) Summary: [Windows] exec*/spawn* problem with spaces in args Initial Comment: DOS and Windows processes are not given an argument vector, as Unix processes are; instead they are given a command line and are expected to perform any necessary argument parsing themselves. Each C run-time library must convert command lines into argument vectors for the main() function, and if it includes exec* and spawn* functions then those must convert argument vectors into a command-line. Naturally, the various implementations differ in interesting ways. The Visual C++ run-time library (MSVCRT) implementation of the exec* and spawn* functions is particularly awful in that it simply concatenates the strings with spaces in-between (see source file cenvarg.c), which means that arguments with embedded spaces are likely to turn into multiple arguments in the new process. Obviously, when Python is built using Visual C++, its os.exec* and os.spawn* functions behave in this way too. MS prefers to work around this bug (see Knowledge Base article Q145937) rather than to fix it. Therefore I think Python must work around it too when built with Visual C++. I experimented with MSVCRT and Cygwin (using the attached program print_args.c) and could not find a way to convert an argument vector into a command line that they would both convert back to the same argument vector, but I got close. MSVCRT's parser requires spaces that are part of an argument to be enclosed in double-quotes. The double-quotes do not have to enclose the whole argument. Literal double-quotes must be escaped by preceding them with a backslash. If an argument contains literal backslashes before a literal or delimiting double-quote, those backslashes must be escaped by doubling them. If there is an unmatched enclosing double-quote then the parser behaves as if there was another double-quote at the end of the line. Cygwin's parser requires spaces that are part of an argument to be enclosed in double-quotes. The double-quotes do not have to enclose the whole argument. Literal double-quotes may be escaped by preceding them with a backslash, but then they count as enclosing double-quote as well, which appears to be a bug. They may also be escaped by doubling them, in which case they must be enclosed in double-quotes; since MSVCRT does not accept this, it's useless. As far as I can see, literal backslashes before a literal double-quote must not be escaped and literal backslashes before an enclosing double-quote *cannot* be escaped. It's really quite hard to understand what its rules are for backslashes and double-quotes, and I think it's broken. If there is an unmatched enclosing double-quote then the parser behaves as if there was another double-quote at the end of the line. Here's a Python version of a partial fix for use in nt.exec* and nt.spawn*. This function modifies argument strings so that the resulting command line will satisfy programs that use MSVCRT, and programs that use Cygwin if that's possible. def escape(arg): import re # If arg contains no space or double-quote then # no escaping is needed. if not re.search(r'[ "]', arg): return arg # Otherwise the argument must be quoted and all # double-quotes, preceding backslashes, and # trailing backslashes, must be escaped. def repl(match): if match.group(2): return match.group(1) * 2 + '\\"' else: return match.group(1) * 2 return '"' + re.sub(r'(\\*)("|$)', repl, arg) + '"' This could perhaps be used as a workaround for the problem. Unfortunately it would conflict with workarounds implemented at the Python level (which I have been using for a while). ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2006-12-21 09:48 Message: Logged In: YES user_id=11375 Originator: NO Does this argument-line parsing weirdness still have relevance to current MSVC runtimes? Changing os.spawn() seems like a non-starter because it'll break existing code; the Python landscape has changed and subprocess.py is a higher-level, more useful way to run subprocesses (it has a MS C runtime-alike function, list2cmdline). Unless someone submits a patch to change _nt_quote_args in distutils/spawn.py, I'll close this bug in a few months (the next time I visit the really old bugs). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-07-12 00:57 Message: Logged In: YES user_id=31435 distutils is *trying* to make spawn work the same way across platforms, via spawn.py. Help it! You're not likely to get anywhere with a change to the os.spawn family because you already know it will break code -- and it will break disutils in particular. If you want to break code, this needs a PEP first: write up your "two stage" approach in PEP and let the community have at it. If you read c.l.py, you should have a feel for how warmly that's likely to be received <wink>. The bit about __argv was just FYI (you seemed unaware of it; I agree it's irrelevant to what you want to achieve). ---------------------------------------------------------------------- Comment By: Ben Hutchings (wom-work) Date: 2001-07-12 00:30 Message: Logged In: YES user_id=203860 "Note that processes using WinMain can get at argc and argv under MSVC via including stdlib.h and using __argc and __argv instead." This is irrelevant. The OS passes the command line into a process as a single string, which it makes accessible through the GetCommandLine() function. The argument vector received by main() or accessible as __argv is generated from this by the C run-time library. "The right way to address this is to add more smarts to spawn.py in distutils" I disagree. The right thing to do is to make these functions behave in the same way across platforms, as far as possible. Perhaps this could be done in two stages - in the first release, make the fix optional, and in the second, use it all the time. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-07-11 22:32 Message: Logged In: YES user_id=31435 Note that processes using WinMain can get at argc and argv under MSVC via including stdlib.h and using __argc and __argv instead. I agree the space behavior sucks regardless. However, as you've discovered, there's nothing magical we can do about it without breaking the workarounds people have already developed on their own -- including distutils. The right way to address this is to add more smarts to spawn.py in distutils, then press to adopt that in the std library (distutils already does *some* magical arg quoting on win32 systems, and could use your help to do a better job of it). Accordingly, I added [Windows] to the summary line, changed the category to distutils, and reassigned to Greg Ward for consideration. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=436259&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com