On Wed, 25 Jul 2018 at 13:48, Sven Van Caekenberghe <s...@stfx.eu> wrote:

> > On 25 Jul 2018, at 13:39, Damien Pollet <damien.pollet+ph...@gmail.com>
> wrote:
> > Related issue: command line arguments come from VM system attributes as
> ByteStrings… and thus interpreted as iso-8859-1, which is incorrect in most
> cases nowadays, even though it seems to work as long as you only use ASCII.
> Decoding them is easy enough, but it requires two copies (asByteString
> utf8Decoded)
>
> Yes this is a really big issue. Anything coming in as command line arg or
> environment variable (or clipboard) is in a basically unknown OS determined
> encoding. I would assume/hope the UTF-8 is the sensible default today, but
> apparently not. And it is hard to find a cross platform solution.
>

My point here was that it would make more sense for those to be passed into
the image as ByteArrays, revealing the fact that their encoding is unknown.
Currently the bytes are correct, but since they've been shoved into
ByteStrings by the VM, the characters will be wrong unless your system
happens to be using Latin 1.

I suppose we can either have a setting for decoding (since it's pretty much
arbitrary), or heuristics like checking LC_CTYPE or whatever. Pablo
mentioned the Locale class, but it doesn't seem to detect anything correct
from the environment.

Reply via email to