On 9/14/05, Jeff Pan <[EMAIL PROTECTED]> wrote: > HI, > > I want to translate some utf8 characters to appropriate characters.I use > utf8 module,but it seems to work uncorrectly. > > This is the code: > > ---------------------------------------------------- > use utf8; > #if (utf8::valid($subject)) > if (Encode::is_utf8($subject)) > { > $subject=utf8::decode($subject); > print $subject,"\n"; > } > ---------------------------------------------------- > > It seems that the is_utf8() function can't judge the given string is > utf8 or not. > > my perl version is 5.8.0. > > can anyone give me some advises? thanks. > -- > Jeff Pan > [EMAIL PROTECTED] >
Jeff, is_ustf8 doesn't really check wheter a given string is utf8; it checks whether Perl's internal utf8 flag is set for the string. That's a big difference. encode() and decode() also behave in ways that may at first seem counterintuitive in this respect: encode normally unsets the flag, and decode normally sets it. See the docs for Encode for a cmoplete explanation, and pay special attention to the numerous "Caveat" section especially the ones that read "CAVEAT: The following operations look the same but are not quite so;" It's not a crazy as it first appeared, though, decode and encode refer to Perls internal representation, and since Perl uses utf8 internally (mostly), decode actually creates utf8 strings--unless the data is entirely ascii, but see the docs. encode does the opposite. it take the (probably utf8) internal data and translates it to a differnt encoding. In any case, Perl knows how its strings are represented, so it's normally safe to ignore the test and just use' encoding(target, $output)' on any data you want output in a particular encoding. Better yet, just set the encoding for the output stream, as with: open my $out, ">:encoding(Latin1)"; #or binmode(STDOUT, ":encoding(Big5)"); See the perldocs for Encode, perlio, perlunicode, perluniintro, encoding, perlebcdic, and utf8 for the gory details. Unicode support is still evolving, so they're pretty gory. HTH, --jay -------------------------------------------------- This email and attachment(s): [ ] blogable; [ x ] ask first; [ ] private and confidential daggerquill [at] gmail [dot] com http://www.tuaw.com http://www.dpguru.com http://www.engatiki.org values of β will give rise to dom!