Linas Vepstas <linasveps...@gmail.com> skribis: > On Mon, Jan 30, 2017 at 1:27 PM, David Kastrup <d...@gnu.org> wrote: >> Marko Rauhamaa <ma...@pacujo.net> writes: >>> David Kastrup <d...@gnu.org>: >>>> Marko Rauhamaa <ma...@pacujo.net> writes: >>>>> Guile's mistake was to move to Unicode strings in the operating system >>>>> interface. >>>> >>>> Emacs uses an UTF-8 based encoding internally [...] >>> >>> C uses 8-bit characters. That is a model worth emulating. >> >> That's Guile-1.8. Guile-2 uses either Latin-1 or UCS-32 in its string >> internals, either Latin-1 or UTF-8 in its string API, and UTF-8 in its >> string port internals. > > Which seems to be a bad decision. I've got strings, 10MBytes long, holding > chinese in UTF8, and guile converts these internally, to UCS-32 which is a > complete and total waste of CPU time. WTF. It then has to convert them > back to UTF8 before passing them to my C++ code that actually does stuff > with them.
I see this as an interaction problem: Guile 2.0 uses UCS-32 internally, and your code uses UTF-8. It could have been the other way around. There were discussions to move to UTF-8 internally in 2.2. As Mike explained, that was not really an option in 2.0 mostly due to the requirement to support O(1) random access. <https://github.com/larcenists/larceny/wiki/StringRepresentations> lists various options and the tradeoffs involved. Ludo’.