On Wed, 01 Dec 2010 02:14:09 +0000, MRAB wrote: > If the filenames are to be shown to a user then there needs to be a > mapping between bytes and glyphs. That's an encoding. If different > users use different encodings then exchange of textual data becomes > difficult.
OTOH, the exchange of binary data is unaffected. In the worst case, users see a few wrong glyphs, but the software doesn't care. > That's where encodings which can be used globally come in. > By the time Python 4 is released I'd be surprised if Unix hadn't > standardised on a single encoding like UTF-8. That's probably not a serious option in parts of the world which don't use a latin-based alphabet, i.e. outside western Europe and its former colonies. In countries with non-latin alphabets, existing encodings are often too heavily entrenched. There's also a lot of legacy software which can only handle unibyte encodings, and not much incentive to fix it if 98% of your market can get by with an ISO-8859-<whatever> locale (making software work in e.g. CJK locales often requires a lot more work than just dealing with encodings). And it doesn't help that Windows has negligible support for UTF-8. It's either UTF-16-LE (i.e. the in-memory format dumped directly to file) or one of Microsoft's non-standard encodings. At least the latter are mostly compatible with the corresponding ISO-8859-* encoding. Finally, ISO-8859-* encoding/decoding can't fail. The result might be complete gibberish, but converting to gibberish then back to bytes won't lose information. -- http://mail.python.org/mailman/listinfo/python-list