I wrote: >There's an obvious parallel with reading data from an input port. >If setlocale is called, then input is by default decoded according >to locale, including the very lossy ASCII decode for C/POSIX. But if >setlocale has not been called, then input is by default decoded according >to ISO-8859-1, preserving the actual octets. It would probably be most >sensible that, if setlocale hasn't been called, getenv should likewise >decode according to ISO-8859-1. It might also be sensible to offer >some explicit control over the encoding to be used with the environment, >just as I/O ports have a concept of per-port selected encoding.
In the light of what I've learned recently about Guile's locale handling, this needs some revision. What I thought was a well-defined "setlocale not called" state is a mirage. The encoding of ports is not reliably fixed at ISO-8859-1; per bug#22910 it can be affected by ostensibly read-only calls to setlocale, and seems to be only accidentally ISO-8859-1 until that's done. So that's not a good model. Due to the GUILE_INSTALL_LOCALE mechanism, a program wanting no locale selected can't just never call setlocale in write mode. So setlocale not having been called is not really available as a way to control anything. So it would seem to be necessary to use some explicit control of character encoding for environment access. (This must be control of encoding per se, not merely of which locale to use for environment access, because, as I noted in the original report, there's no guarantee of a locale with a suitable encoding.) This could be an optional parameter to the environment access functions, or a settable variable that takes precedence over locale to determine encoding for all environment access. The latter would match the encoding model used by ports. -zefram