> > Your problem is, I think, that you think the magic of decoding source > code from the byte sequence into unicode happens in exec or eval. It > doesn't. It happens in between reading the file and passing the > contents of the file to exec or eval. > I think you are wrong here. Decoding source happens inside eval. Here is the proof:
s = 'u"' + '\xdb' + '"' print eval(s) == eval( "# -*- coding: iso8859-2\n" + s) # prints False, indicating that the decoding of the string expression happened inside eval! It can also be prooven that eval does not use 'ascii' codec for default decoding: '\xdb'.decode('ascii') # This will raise an UnicodeDecodeError eval() somehow decoded the passed expression. No question. It did not use 'ascii', nor 'latin2' but something else. Why is that? Why there is a particular encoding hard coded into eval? Which is that encoding? (I could not decide which one, since '\xdb' will be the same in latin1, latin3, latin4 and probably many others.) I suspected that eval is going to use the same encoding that the python source file/console had at the point of execution, but this is not true: the following program prints u'\xdb' instead of u'\u0170': <snip> # -*- coding iso8859-2 -*- s = '\xdb' expr = 'u"' + s +'"' print repr(eval(expr)) </snip> Regards, Laszlo -- http://mail.python.org/mailman/listinfo/python-list