<to...@tuxteam.de> writes:

> On Tue, Feb 14, 2017 at 10:19:14PM +0000, Chris Vine wrote:
>> On Tue, 14 Feb 2017 21:52:01 +0000 (UTC)
>> Mike Gran <spk...@yahoo.com> wrote:
>> [snip]
>> > > In particular, filenames are *not*, nor can they be mapped to,
>> > > Unicode  
>> > 
>> > > strings in Linux.  
>> > 
>> > True. Linux should follow OpenBSD and make all locales UTF-8.
>> 
>> Filenames and locales are not necessarily related.  When you access a
>> networked file system, you get the filename encoding you are given,
>> which may or may not be the same as the particular locale encoding on
>> your particular machine on one particular day, and may or may not be a
>> unicode encoding.  Glib, for example, enables you to set this with the
>> G_FILENAME_ENCODING environmental variable [...]
>
> which is, btw., "just a better approximation", but still wrong: the
> application creating a directory might have been "in" a different
> locale (and thus having a different encoding) that the one creating
> the file whithin that directory.
>
> Most notably, the whole path might cross several mount points, thus
> the whole path can well have fragments coming from several file systems.
>
> I think the only sane way to see a Linux file system path is the way
> Linux sees it: as a byte string.
>
> Sure, some helper infrastructure to try to make characters of that
> mess will be welcome, but that should be absolutely robust wrt.
> unexpected input e.g. bad UTF-8) and leave control to the application.
>
> Not easy.

If you tell Emacs that some external entity is in UTF-8, it will
represent all valid UTF-8 sequences as properly decoded characters, and
it has special codes for all bytes not part of valid UTF-8.

As a result, it works with valid UTF-8 perfectly as expected but will
reproduce arbitrary byte streams thrown at it perfectly when decoding as
UTF-8 and then reencoding into UTF-8 again.

Guile is lacking this byte stream reproducibility when
decoding/reencoding.  That makes it a whole lot less robust for dealing
with externally provided material.

-- 
David Kastrup


Reply via email to