On Sat, Mar 19, 2016 at 8:02 AM, Marko Rauhamaa <ma...@pacujo.net> wrote: > Chris Angelico <ros...@gmail.com>: >> On Sat, Mar 19, 2016 at 2:26 AM, Marko Rauhamaa <ma...@pacujo.net> wrote: >>> It may be that Python's Unicode abstraction is an untenable illusion >>> because the underlying reality is 8-bit and there's no way to hide it >>> completely. >> >> The underlying reality is 1-bit. Or maybe the underlying reality is >> actually electrical signals that don't even have a clear definition of >> "bits" and bounce between two states for a few fractions of a second >> before settling. And maybe someone's implementing Python on the George >> Banks Kite CPU, which consists of two cents' worth of paper and >> string, on which text is actually represented by glyph. They're all >> equally valid notions of "underlying reality". >> >> Text is an abstract concept, just as numbers are. > > The question is how tenable the illusion is. If the OS gave the > appropriate guarantees (say, all pathnames are encoded Unicode strings), > the abstraction could be maintained. Unfortunately, the legacy shines > through making you wonder if Python has overreached prematurely with its > Unicode HAL.
The problem is not Python's Unicode strings, then. The problem is the notion that path names are text. If they're text, they should be exclusively text (although, for low-level efficiency, they're more likely to be defined as "valid UTF-8 sequences" rather than "sequences of Unicode codepoints"); since they're not, they are fundamentally bytes. But that's not a problem with Python - it's a problem with the file system. ChrisA -- https://mail.python.org/mailman/listinfo/python-list