This short C program illustrates the issue. The locale, the output port etc. are UTF-8. The bad results are no surprise: the code currently in git for scm_puts etc. explicitly ignores the locale setting, always, and always assumes latin1 -- its hard-coded in there.
--linas #include <libguile.h> void *wrap_eval(void* p) { char *wtf = "(setlocale LC_ALL \"\")"; SCM eval_str = scm_from_utf8_string(wtf); scm_eval_string(eval_str); return NULL; } void *wrap_puts(void* p) { char *wtf = p; SCM port = scm_current_output_port (); scm_puts("the port-encoding is=", port); scm_puts(scm_to_utf8_string(scm_port_encoding(port)), port); scm_puts("\nThe string to display is =", port); scm_puts (wtf, port); scm_puts("\nWas expecting to see this=", port); SCM str = scm_from_utf8_string(wtf); scm_display(str, port); scm_puts("\n\n", port); return NULL; } int main(int argc, char* argv[]) { scm_with_guile(wrap_eval, 0x0); char * wtf = "Ćićolina"; scm_with_guile(wrap_puts, wtf); wtf = "Thủ Dầu Một"; scm_with_guile(wrap_puts, wtf); wtf = "Småland"; scm_with_guile(wrap_puts, wtf); wtf = "Hòa Phú Phú Tân"; scm_with_guile(wrap_puts, wtf); wtf = "係 拉 丁 字 母"; scm_with_guile(wrap_puts, wtf); } The output is always this: the port-encoding is=UTF-8 The string to display is =Ćićolina Was expecting to see this=Ćićolina the port-encoding is=UTF-8 The string to display is =Thá»§ Dầu Má»™t Was expecting to see this=Thủ Dầu Một the port-encoding is=UTF-8 The string to display is =SmÃ¥land Was expecting to see this=Småland the port-encoding is=UTF-8 The string to display is =Hòa Phú Phú Tân Was expecting to see this=Hòa Phú Phú Tân the port-encoding is=UTF-8 Was expecting to see this=係 拉 丁 字 母 æ¯ What's cool is that all this stuff works in email! --linas On Mon, Jan 9, 2017 at 4:03 PM, Andy Wingo <wi...@pobox.com> wrote: > On Sun 08 Jan 2017 19:16, Linas Vepstas <linasveps...@gmail.com> writes: > >> There appears to be a regression in guile-2.2 with utf8 handling >> in the scm_puts() scm_lfwrite() and scm_c_put_string() functions. >> >> In guile-2.0, one could give these utf8-encoded strings, and these >> would display just fine. In 2.2 they get mangled. > > Could it be this from NEWS: > > ** Better locale support in Guile scripts > > When Guile is invoked directly, either from the command line or via a > hash-bang line (e.g. "#!/usr/bin/guile"), it now installs the current > locale via a call to `(setlocale LC_ALL "")'. For users with a unicode > locale, this makes all ports unicode-capable by default, without the > need to call `setlocale' in your program. This behavior may be > controlled via the GUILE_INSTALL_LOCALE environment variable; see the > manual for more.