Eli Zaretskii <e...@gnu.org> writes: >> From: David Kastrup <d...@gnu.org> >> Date: Mon, 30 Jan 2017 19:32:14 +0100 >> Cc: guile-user@gnu.org >> >> Emacs uses an UTF-8 based encoding internally: basically, valid UTF-8 is >> represented as itself, there is a number of coding points beyond the >> actual limit of UTF-8 that is used for non-Unicode character sets, and >> single bytes not properly belonging to the read encoding are represented >> with 0x00...0x7f, 0xc0 0x80 ... 0xc0 0xbf and 0xc1 0x80 ... 0xbf (the >> latter two ranges are "overlong" encodings of 0x00...0x7f and >> consequently also not valid utf-8). > > One other crucial detail is that Emacs also has unibyte strings > (arrays of bytes), which are necessary during startup, when Emacs > doesn't yet know how to decode non-ASCII strings. Without that, you > wouldn't be able to start Emacs in a directory whose name includes > non-ASCII characters, because it couldn't access files it needs to > read to set up some of its decoding machinery.
Hm, I know that XEmacs-Mule emphatically does not have unibyte strings (and Stephen considers them a complication and abomination that should never have been left in Emacs), so it must be possible to get away without them. And I don't think that the comparatively worse Mule implementation of XEmacs is due to that decision. -- David Kastrup