On 9/30/19 8:39 PM, Geoff Kuenning wrote:
> $'\361' is a valid character in Latin-1, which is how it happened to arise
> in my case.  Also, I tested with the C locale, which should be agnostic to
> character encodings, and got the same result.

That's the strange part. I can't reproduce this with the C locale at
all -- it's a separate code path that just treats every byte as a
character. I didn't try a lot of non-UTF8 encodings, but I can't reproduce
it on any of the (mostly western European) ISO8859-1 locales I tried.
That's why I ended up using UTF-8 for my tests and figuring out where
the problem was.

> 
> The general Unix philosophy, which in this case says "I'm not going to pass
> judgment on the weird things you do even though I don't understand them",
> argues for being able to handle any arbitrary sequence of bytes, at least
> on Linux.  

Yeah, on Linux, at least with the common file systems, the filenames are
still just byte sequences. That's not the case everywhere -- as I said, you
can't even create a file with an invalid byte sequence in the name on
Mac OS X, no matter what your locale is.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    c...@case.edu    http://tiswww.cwru.edu/~chet/

Reply via email to