Chris Angelico <ros...@gmail.com>: > On Sat, Mar 19, 2016 at 8:28 AM, Marko Rauhamaa <ma...@pacujo.net> wrote: >> The file system does not have a problem. Python has a problem because it >> tries to present pathnames as Unicode strings, which isn't always >> possible. > > But what does a file name *mean*?
A Linux/UNIX file name is an extended ASCII string, where the interpretation of bytes in the range 128..255 are left ambiguous. That's the legacy of the early 1980's. At that time 8-bit bytes were standard, and the parity nonsense was virtually gone. C, Emacs and the OS supported those bytes without a problem but treated them as some sort of control characters (they were represented with the octal \nnn notation). Some systems used the upper byte range for block graphics (CP/M). Some systems used the upper byte range to represent Hebrew letters (Atari). Then came ISO-8859-x and the locales (yuck!). Sun scrambled to make SunOS "8-bit clean". ISO-8859-1 was widely taken as the default for the Civilized World. Pathnames reflected that colonialist mindset. ISO-8859-1 was the state of the art around 1995 (HTML). UCS-2 was the avant-garde adopted by Windows and Java. UTF-8 came later, and Linux luckily avoided the UCS-2 mess. All that "extended ASCII" legacy is still the reaily on Linux and won't go away in the foreseeable future. I suppose OSX is the only mainstream operating system that had the full benefit of hindsight. And even they messed it up with case-insensitive pathnames. > If I were building an entire OS ecosystem from scratch today, I'd > probably do a lot of things with a hybrid system of documented meaning > atop implementation-detail APIs. In this particular case, I would > define the API in terms of byte sequences, but clearly documenting > that these byte sequences are to be understood to mean text strings, > and thus must be valid UTF-8. UTF-8 shouldn't have anything to do with the abstract pathnames (which should be normalized Unicode). Also, special-casing '\0' and '/' is lame. Why can't I have "Results 1/2016" as a filename? Marko -- https://mail.python.org/mailman/listinfo/python-list