John Machin wrote: > On Feb 25, 12:00 am, Peter Otten <__pete...@web.de> wrote: >> John Machin wrote: > >> > Your Python 2.x code should be TESTED before you poke 2to3 at it. In >> > this case just trying to run or import the offending code file would >> > have given an informative syntax error (you have declared the .py file >> > to be encoded in UTF-8 but it's not). >> >> The problem is that Python 2.x accepts arbitrary bytes in string >> constants. > > Ummm ... isn't that a bug? According to section 2.1.4 of the Python > 2.7.1 Language Reference Manual: """The encoding is used for all > lexical analysis, in particular to find the end of a string, and to > interpret the contents of Unicode literals. String literals are > converted to Unicode for syntactical analysis, then converted back to > their original encoding before interpretation starts ...""" > > How do you reconcile "used for all lexical analysis" and "String > literals are converted to Unicode for syntactical analysis" with the > actual (astonishing to me) behaviour?
You are right, the current behaviour is probably an implementation accident stemming from the assumption that s.decode("utf-8").encode("utf-8") == s always holds. Other encodings (I tried cp1252) produce the expected SyntaxError. -- http://mail.python.org/mailman/listinfo/python-list