<to...@tuxteam.de> writes: > On Tue, Feb 14, 2017 at 10:19:14PM +0000, Chris Vine wrote: >> On Tue, 14 Feb 2017 21:52:01 +0000 (UTC) >> Mike Gran <spk...@yahoo.com> wrote: >> [snip] >> > > In particular, filenames are *not*, nor can they be mapped to, >> > > Unicode >> > >> > > strings in Linux. >> > >> > True. Linux should follow OpenBSD and make all locales UTF-8. >> >> Filenames and locales are not necessarily related. When you access a >> networked file system, you get the filename encoding you are given, >> which may or may not be the same as the particular locale encoding on >> your particular machine on one particular day, and may or may not be a >> unicode encoding. Glib, for example, enables you to set this with the >> G_FILENAME_ENCODING environmental variable [...] > > which is, btw., "just a better approximation", but still wrong: the > application creating a directory might have been "in" a different > locale (and thus having a different encoding) that the one creating > the file whithin that directory. > > Most notably, the whole path might cross several mount points, thus > the whole path can well have fragments coming from several file systems. > > I think the only sane way to see a Linux file system path is the way > Linux sees it: as a byte string. > > Sure, some helper infrastructure to try to make characters of that > mess will be welcome, but that should be absolutely robust wrt. > unexpected input e.g. bad UTF-8) and leave control to the application. > > Not easy.
If you tell Emacs that some external entity is in UTF-8, it will represent all valid UTF-8 sequences as properly decoded characters, and it has special codes for all bytes not part of valid UTF-8. As a result, it works with valid UTF-8 perfectly as expected but will reproduce arbitrary byte streams thrown at it perfectly when decoding as UTF-8 and then reencoding into UTF-8 again. Guile is lacking this byte stream reproducibility when decoding/reencoding. That makes it a whole lot less robust for dealing with externally provided material. -- David Kastrup