On 9/14/05, Jeff Pan <[EMAIL PROTECTED]> wrote:
> HI,
> 
> I want to translate some utf8 characters to appropriate characters.I use
> utf8 module,but it seems to work uncorrectly.
> 
> This is the code:
> 
> ----------------------------------------------------
>         use utf8;
>         #if (utf8::valid($subject))
>         if (Encode::is_utf8($subject))
>         {
>                 $subject=utf8::decode($subject);
>                 print $subject,"\n";
>         }
> ----------------------------------------------------
> 
> It seems that the is_utf8() function can't judge the given string is
> utf8 or not.
> 
> my perl version is 5.8.0.
> 
> can anyone give me some advises? thanks.
> --
>   Jeff Pan
>   [EMAIL PROTECTED]
> 

Jeff,

is_ustf8 doesn't really check wheter a given string is utf8; it checks
whether Perl's internal utf8 flag is set for the string. That's a big
difference. encode() and decode() also behave in ways that may at
first seem counterintuitive in this respect: encode normally unsets
the flag, and decode normally sets it. See the docs for Encode for a
cmoplete explanation, and pay special attention to the numerous
"Caveat" section especially the ones that read "CAVEAT: The following
operations look the same but are not quite so;"

It's not a crazy as it first appeared, though, decode and encode refer
to Perls internal representation, and since Perl uses utf8 internally
(mostly), decode actually creates utf8 strings--unless the data is
entirely ascii, but see the docs. encode does the opposite. it take
the (probably utf8) internal data and translates it to a differnt
encoding. In any case, Perl knows how its strings are represented, so
it's normally safe to ignore the test and just use' encoding(target,
$output)' on any data you want output in a particular encoding.

Better yet, just set the encoding for the output stream, as with:

   open my $out, ">:encoding(Latin1)"; #or
   binmode(STDOUT, ":encoding(Big5)");

See the perldocs for Encode, perlio, perlunicode, perluniintro,
encoding, perlebcdic, and utf8 for the gory details. Unicode support
is still evolving, so they're pretty gory.

HTH,

--jay

--------------------------------------------------
This email and attachment(s): [  ] blogable; [ x ] ask first; [  ]
private and confidential

daggerquill [at] gmail [dot] com
http://www.tuaw.com  http://www.dpguru.com  http://www.engatiki.org

values of β will give rise to dom!

Reply via email to