Re: OT: Python

Russ Allbery Mon, 14 Feb 2011 13:11:27 -0800

Ian Jackson <ijack...@chiark.greenend.org.uk> writes:
> Klaus Ethgen writes:


>> No, it is not. 00a3 is just not a utf-8 character, it is unicode. To
>> get a correct utf-8 character you need to print \x{c2a3} and then
>> isutf8 is happy.

> When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
> characters to stdout should use UTF-8.  That's what LC_TYPE means.

Perl is specifically documented to not do this for backward compatibility
reasons.  In Perl, which is the one I know best, you are required to
decode input and encode output if you want to have UTF-8 handling.

windlord:~> env LC_CTYPE=en_US.UTF-8 perl -e 'print "\x{00a3}\n"'
<glyph for mangled Unicode character>
windlord:~> env LC_CTYPE=en_US.UTF-8 perl -MEncode -e 'print encode("utf-8", 
"\x{00a3}\n")'
<proper Unicode pound sign>

See perlunicode(1).  There are a variety of reasons for this that turn out
to be fairly good ones if you don't want to badly break a bunch of
existing Perl scripts that were dealing with, for example, binary data.

-- 
Russ Allbery (r...@debian.org)               <http://www.eyrie.org/~eagle/>


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87lj1ijp93....@windlord.stanford.edu

Re: OT: Python

Reply via email to