On Wed, 25 Jul 2018 at 13:48, Sven Van Caekenberghe <s...@stfx.eu> wrote:
> > On 25 Jul 2018, at 13:39, Damien Pollet <damien.pollet+ph...@gmail.com> > wrote: > > Related issue: command line arguments come from VM system attributes as > ByteStrings… and thus interpreted as iso-8859-1, which is incorrect in most > cases nowadays, even though it seems to work as long as you only use ASCII. > Decoding them is easy enough, but it requires two copies (asByteString > utf8Decoded) > > Yes this is a really big issue. Anything coming in as command line arg or > environment variable (or clipboard) is in a basically unknown OS determined > encoding. I would assume/hope the UTF-8 is the sensible default today, but > apparently not. And it is hard to find a cross platform solution. > My point here was that it would make more sense for those to be passed into the image as ByteArrays, revealing the fact that their encoding is unknown. Currently the bytes are correct, but since they've been shoved into ByteStrings by the VM, the characters will be wrong unless your system happens to be using Latin 1. I suppose we can either have a setting for decoding (since it's pretty much arbitrary), or heuristics like checking LC_CTYPE or whatever. Pablo mentioned the Locale class, but it doesn't seem to detect anything correct from the environment.