Eli Zaretskii <e...@gnu.org> writes:

>> From: David Kastrup <d...@gnu.org>
>> Date: Mon, 30 Jan 2017 19:32:14 +0100
>> Cc: guile-user@gnu.org
>> 
>> Emacs uses an UTF-8 based encoding internally: basically, valid UTF-8 is
>> represented as itself, there is a number of coding points beyond the
>> actual limit of UTF-8 that is used for non-Unicode character sets, and
>> single bytes not properly belonging to the read encoding are represented
>> with 0x00...0x7f, 0xc0 0x80 ... 0xc0 0xbf and 0xc1 0x80 ... 0xbf (the
>> latter two ranges are "overlong" encodings of 0x00...0x7f and
>> consequently also not valid utf-8).
>
> One other crucial detail is that Emacs also has unibyte strings
> (arrays of bytes), which are necessary during startup, when Emacs
> doesn't yet know how to decode non-ASCII strings.  Without that, you
> wouldn't be able to start Emacs in a directory whose name includes
> non-ASCII characters, because it couldn't access files it needs to
> read to set up some of its decoding machinery.

Hm, I know that XEmacs-Mule emphatically does not have unibyte strings
(and Stephen considers them a complication and abomination that should
never have been left in Emacs), so it must be possible to get away
without them.  And I don't think that the comparatively worse Mule
implementation of XEmacs is due to that decision.

-- 
David Kastrup

Reply via email to