Ian Jackson <ijack...@chiark.greenend.org.uk> writes: > Klaus Ethgen writes:
>> No, it is not. 00a3 is just not a utf-8 character, it is unicode. To >> get a correct utf-8 character you need to print \x{c2a3} and then >> isutf8 is happy. > When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode > characters to stdout should use UTF-8. That's what LC_TYPE means. Perl is specifically documented to not do this for backward compatibility reasons. In Perl, which is the one I know best, you are required to decode input and encode output if you want to have UTF-8 handling. windlord:~> env LC_CTYPE=en_US.UTF-8 perl -e 'print "\x{00a3}\n"' <glyph for mangled Unicode character> windlord:~> env LC_CTYPE=en_US.UTF-8 perl -MEncode -e 'print encode("utf-8", "\x{00a3}\n")' <proper Unicode pound sign> See perlunicode(1). There are a variety of reasons for this that turn out to be fairly good ones if you don't want to badly break a bunch of existing Perl scripts that were dealing with, for example, binary data. -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/87lj1ijp93....@windlord.stanford.edu