Hi, I've noticed that the encoding of non-ascii filenames can be inconsistent between platforms when using the built-in open() function to create files.
For example, on a Ubuntu 10.04.4 LTS box, the character u'ş' (u'\u015f') gets encoded as u'ş' (u's\u0327'). Note how the two characters look exactly the same but are encoded differently. The original character uses only one code (u'\u015f'), but the resulting character that is saved on the file system will be made of a combination of two codes: the letter 's' followed by a diacritical cedilla (u's\u0327'). (You can learn more about diacritics in [1]). On the Mac, however, the original encoding is always preserved. This issue was also discussed in a blog post by Ned Batchelder [2]. One suggested approach is to normalize the filename, however this could result in loss of information (what if, for example, the original filename did contain combining diacritics and we wanted to preserve them). Ideally, it would be preferable to preserve the original encoding. Is that possible or is that completely out of Python's control? Thanks a lot, Julien [1] http://en.wikipedia.org/wiki/Combining_diacritic#Unicode_ranges [2] http://nedbatchelder.com/blog/201106/filenames_with_accents.html -- http://mail.python.org/mailman/listinfo/python-list