Marko Rauhamaa <ma...@pacujo.net> writes: > David Kastrup <d...@gnu.org>: > >> Marko Rauhamaa <ma...@pacujo.net> writes: >>> Guile's mistake was to move to Unicode strings in the operating system >>> interface. >> >> Emacs uses an UTF-8 based encoding internally [...] > > C uses 8-bit characters. That is a model worth emulating.
That's Guile-1.8. Guile-2 uses either Latin-1 or UCS-32 in its string internals, either Latin-1 or UTF-8 in its string API, and UTF-8 in its string port internals. > UTF-8 beautifully bridges the interpretation gap between 8-bit > character strings and text. However, the interpretation step should be > done in the application and not in the programming language. Elisp is focused enough about text that I think its choice of going UTF-8 internally with a Unicode character type reasonably sane. Its strings (the quirky unibyte strings excluded) are its own variant of UTF-8 internally, and its string port equivalent (buffers) are that same variant of UTF-8. And its API talks UTF-8 for strings, Unicode (or higher) for characters, and it indexes strings and buffers via Unicode character counts. Not O(1), but with enough trickery that it works well enough in practice. If strings are to be implemented strictly Scheme-standard-conforming, they need to be O(1) indexable. The Scheme standard is rather silent about Unicode however. I am not sure that sticking to the standard where it does not deal with reality is the best choice. I think the case for Guile-2 to _also_ support "unibyte strings" would be quite stronger than for Emacs (byte arrays and binary string ports don't allow using Guile's string processing functions). As it stands, the design of Guile-2 in my book currently involves too many mandatory conversions for just passing data around with Guile itself and Guile-based applications. > Support libraries for Unicode are naturally welcome. > > Plain Unicode text is actually quite a rare programming need. It is > woefully inadequate for the human interface, which generally requires > numerous other typesetting effects. But is also causing unnecessary > grief in the computer-computer interface, where the classic textual > naming and textual protocols are actually cutely chosen octet-aligned > binary formats. Sometimes yes, sometimes not. As long as Guile wants to be a general-purpose programming and extension language, it should deal reliably and robustly and reproducibly with whatever is thrown at it. Its choice of libraries does not currently make it so, but that could be fixed by either working on the (GNU) libraries or by giving Guile its own implementation. But that needs to be considered a priority. Nobody will do this just for fun and kicks. -- David Kastrup