On 2/12/2012 3:12 AM, Steven D'Aprano wrote: > NTFS by default uses the UTF-16 encoding, which means the actual bytes > written to disk are \x1d\x040\x04\xe5\x042\x04 (possibly with a leading > byte-order mark \xff\xfe). That's what I meant. Those bytes will be interpreted consistently across all locales.
> Windows has two separate APIs, one for "wide" characters, the other for > single bytes. Depending on which one you use, the directory will appear > to be called Наӥв or 0å2. Yes, and AFAIK, the wide API is the default. The other one only exists to support programs that don't support the wide API (generally, such programs were intended to be used on older platforms that lack that API). > But in any case, we're not talking about the file name encoding. We're > talking about the contents of files. Okay then. As I stated, this has nothing to do with the OS since programs are free to interpret bytes any way they like. -- CPython 3.2.2 | Windows NT 6.1.7601.17640 -- http://mail.python.org/mailman/listinfo/python-list