On Sun, Apr 07, 2013 at 06:08:52PM +0200, Christophe Martin wrote:
> Hello ,

Hello,

> I'm trying to use JSON::XS. The trouble is that I have French strings
> with accented chars. The solutions I tried don't work. Also, I'm
> not sure i understand JSON::XS doc on unicode and utf.
> 
> I use linux. The shell, terminal, and files all use utf-8.
> Locales are installed, and I have LANG=fr_FR.UTF-8 and a bunch of
> LC_THING=fr_FR.UTF-8 env variables. Apart from my script, everything
> works perfectly well.
> 
> As you can see below, with "use utf8;", json is ok and can be
> decoded by other progs, but perl strings turn to latin1 encoding.
> with "no utf8" or without "use utf8", json strings are
> «utf8 encoded twice» but simple perl strings are ok.
> 
> Ubuntu linux 12.04 with perl 5.14.2, and JSON::XS 2.320-1build1
> if that matters
> 
> I must have missed something, hopefully simple. Any Idea ?
> 
> #! /usr/bin/perl
> use utf8;
> use strict;
> use warnings;
> use JSON::XS;
> 
> my %srchash = ( 'éléphant' => 'ça trompe' );
> my $json = encode_json \%srchash;
> my %dsthash = %{decode_json $json};
> while ( my ($key, $value) = each %srchash ) {
> print $key, ' => ', $value, "\n";
> }
> print $json, "\n";
> while ( my ($key, $value) = each %dsthash ) {
> print $key, ' => ', $value, "\n";
> }
> 
> The results without "use utf8;" (or with "no utf8;") :
> éléphant => ça trompe
> {"éléphant":"ça trompe"}
> éléphant => ça trompe
> 
> with "use utf8", here is what I get
> �l�phant => �a trompe
> {"éléphant":"ça trompe"}
> �l�phant => �a trompe

The documentation says that the exported decode_json and
encode_json subs expect the input and output to be UTF-8 encoded.
That is, they expect the keys and values to be binary strings
encoded in UTF-8. Then internally JSON::XS will decode and encode
as necessary.

The utf8 pragma tells Perl that the source code is UTF-8 encoded,
and that Perl should automatically decode scalar strings
(including hash keys), etc., automatically.

So I think that the use cast for decode_json and encode_json is
different, though I honestly can't figure out what it would be...

This seems to work though. Note that I explicitly set the IO
encoding layer for the STDOUT handle and use the JSON::XS OO
interface with automatic UTF-8 decoding/encoding disabled (since
Perl has already handled that for us).

#! /usr/bin/perl

use utf8;
use strict;
use warnings;

use Data::Dumper;
use Encode;
use JSON::XS;

binmode \*STDOUT, ':encoding(UTF-8)';

my %src = ( 'éléphant' => 'ça trompe' );

my $jsonizer = JSON::XS->new();
my $json = $jsonizer->encode(\%src);
my %dst = %{$jsonizer->decode($json)};

print "$_\n" for keys %src, values %src;
print $json, "\n";
print "$_\n" for keys %src, values %dst;

__END__

Since we used the utf8 pragma Perl is already decoding the keys
and values as UTF-8 and storing them appropriately internally
(the details of which shouldn't matter to us). So JSON::XS
shouldn't need to do any character encoding/decoding with the
data. It is already internally stored properly and all operations
on those values in Perl should be character-wise automatically.

I'm not really sure then what purpose JSON::XS::utf8 is intended
to serve. It sounds to me like when enabled it expects the input
data structure to be binary (UTF-8 encoded data). I'm not sure
under what circumstances you'd want hash keys to be in binary
when you could instead just decode the data from where ever into
text strings and happily work with them from there...

I wonder if maybe it could be a bug in JSON::XS that it modifies
these already Unicode-aware scalars. I am not qualified to assert
that it is so. The documentation seems clear that the behavior is
by design and I have to assume that the module author(s) know
more about Unicode in Perl than I do. :)

Regards,


-- 
Brandon McCaig <bamcc...@gmail.com> <bamcc...@castopulence.org>
Castopulence Software <https://www.castopulence.org/>
Blog <http://www.bamccaig.com/>
perl -E '$_=q{V zrna gur orfg jvgu jung V fnl. }.
q{Vg qbrfa'\''g nyjnlf fbhaq gung jnl.};
tr/A-Ma-mN-Zn-z/N-Zn-zA-Ma-m/;say'

Attachment: signature.asc
Description: Digital signature

Reply via email to