Ludovic Courtès <ludo <at> gnu.org> writes:
> Yes, that's probably a good idea. At any rate, we only have
> `scm_to_locale_string ()' currently so it's not too late to add a single
> function with an encoding parameter in lieu of the proposed
> `scm_to_{utf8,utf16,utf32,ucs4,...}_string ()'.
>
> But first of all, one needs to implement Unicode support.
FWIW, I have a complete unicode support library for Guile called GuICU. It
lives at http://gano.sourceforge.net. It works for me, but, hasn't been
widely tested.
It is built on the large and cumbersome IBM ICU library. ICU encodes things
internally as UTF16, which I always though of as a poor idea, since neither
allows O(1) seeking of individual codepoints nor works so well with UTF-8.
Based on my experience with ICU and putting this library together, and looking
at what r6rs claims should be the future for Unicode, I really do think that
UTF-32 is the way to go.
Alternately, one could build a string library where strings are represented as
either u8 or u32 vectors. If a string function is asked to operate on a u32
vector, it will assume a UTF32 encoding. If a string function is asked to
operate on a u8 vector it will either require a locale or, as a fallback,
treat the string as a raw byte vector.
This would be twice the work to implement, though.