Bugs item #1306484, was opened at 2005-09-28 06:49 Message generated for change (Comment added) made by wigy You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1306484&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Parser/Compiler Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Vágvölgyi Attila (wigy) Assigned to: Martin v. Löwis (loewis) Summary: compile() converts "filename" parameter to StringType Initial Comment: The builtin compile() signature looks like: compile(string, filename, kind[, flags[, dont_inherit]]) The string parameter can be either StringType or UnicodeType, but the filename parameter will be converted to StringType, so if there are non-ascii characters in the unicode object passed, it raises UnicodeEncodeError. This can be an issue on filesystems having utf-8 filenames, or when using non-English names for the backtrace beautification. The attached file contains a unit test that will succeed when the bug is resolved. I saw the error in 2.3 and 2.4, maybe it is there for all releases? ---------------------------------------------------------------------- >Comment By: Vágvölgyi Attila (wigy) Date: 2005-09-29 10:29 Message: Logged In: YES user_id=156682 loewis, I confess I could not understand a word. But as I see, it would have some advantages to have a completely unicode internal filename representation on systems having multiple filesystems mounted with different encodings, or systems having simply utf-8 filesystems (no 'ascii', 'replace' for allowing two filenames differing only in accents). I agree with Joel Spolsky (http://www.joelonsoftware.com/articles/Unicode.html), and I think that if choosing unicode could be easier in a language, than most of l10n problems would be solved. I understand, that coding unicode in C is a pain. Imagine - theoretically - if a literal like "hello" would automatically mean a unicode object in python, and you had to write s"hello" to make a literal string object encoded in a way some enviromental settings (or maybe the PEP 0263 header of the specific source file?) determine, so you have control on what happens. Imagine the case when there is a latin1 and a utf-8 partition mounted, and the console is latin2! Life would be much, much easier for a non-American programmer if she had to be aware from the first moment, that she is in an international environment. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2005-09-29 08:34 Message: Logged In: YES user_id=849994 Sounds sound. :) ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2005-09-29 08:20 Message: Logged In: YES user_id=21627 Why couldn't co_filename just be the Unicode string? I think one would have to change: - code_repr, to convert the filename into a byte string (preferably using 'ascii', 'replace') - tb_printinternal (not sure what to do here) - code_new, to accept either strings or unicode strings - builtin_compile, which probably indeed needs to convert the string using the file system encoding, and then patch the resulting code object to point to the unicode object originally passed (unless we can accept more pythonrun functions). ---------------------------------------------------------------------- Comment By: Reinhold Birkenfeld (birkenfeld) Date: 2005-09-28 14:54 Message: Logged In: YES user_id=1188172 Should compile() use the Py_FileSystemEncoding? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1306484&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com