Lauri Alanko added the comment: The port is certainly not yet "complete" in any sense. I have only fixed the most obvious places where explicit conversion between ASCII/Unicode values and platform-specific characters is required. There are a number of remaining issues, some of which cannot be fixed without major rehauls. The point of this first release is just to allow other interested people to chime in, to test the patch, and to suggest what should be done with it. The latter has certainly happened. :)
I have no great interest in whether the patch ever gets incorporated into the main Python distribution. I do think, though, that it's a good idea to make the relationship between characters and Unicode values more explicit in the code in any case, and my patch shouldn't affect the behavior on any other platforms. Guido's comment about networking code is quite accurate, but the problem is social, not technical: there is already networking code that assumes that 8-bit string literals represent ASCII strings, and there is already text-processing code that assumes that 8-bit string literals represent "text" as found in ordinary text files on the platform. There is no reliable way to make both kinds of code work on a platform whose native encoding is not ASCII-compatible. In this sense, it is indeed impossible to port Python 2.x to an EBCDIC platform "completely", so that all existing code would continue to do "the right thing" without modifications. However, Py3k presents a fresh start, and one where this particular problem is gone, since string literals are no longer associated with a particular encoding, and bytes literals explicitly represent the ASCII values of the characters in the literal expression. Then text-processing code will likely use string literals, and it easy to make the default encoding platform-specific when transferring data between local text files and string objects. As far as I can see, EBCDIC shouldn't pose any special problems then. >From what I read in PEP 3120 and the Py3k docs, there seems to be some confusion regarding source encoding issues. Firstly, Python source code is fundamentally _text_. For instance, a string literal is delimited by single quote or double quote characters. Characters themselves are abstract entities that have no inherent numeric values, although we can name them with e.g. Unicode code points, so we can say that the string delimiters are characters represented by the code points U+0022 and U+0027. What PEP 3120 specifies is a mechanism for mapping octet sequences into these abstract characters. If this is made part of the language specification, it presumably means that a conformant Py3k source file must start as UTF-8 at least until an encoding declaration is encountered. Further, a conformant Py3k implementation must accept such UTF-8 source files and decode them as specified in the PEP. So far so good. however, there is nothing to prevent an implementation from providing (as an extension) a facility to allow _other_ kinds of source as well. "There is no room for platform-specific derivations" is an arbitrary restriction: there are certainly quite a number of ways to support both UTF-8 and CP1047 source on z/OS: for instance, the filesystem allows storing the encoding of a text file as metadata. Moreover, there is a semantics-preserving mapping from UTF-8 source files to CP1047 source files: since non-ASCII characters can only appear in comments an string literals, and comments have no semantics, it suffices to \u-escape the exotic characters in string literals. Hence all Python source can be represented as native text on an EBCDIC platform. Of course you can declare that support for such extensions would be heretical and no EBCDIC source file would be True Python Source and no EBCDIC implementation would be a True Python Implementation, but I don't really care. Python 3000 _can_ be ported to z/OS much better than 2.x, and it probably will, even if you don't like it. Oh the wonders of open source. :) __________________________________ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1298> __________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com