From: "Jay Savage" <[EMAIL PROTECTED]>
Try to unpack the data--or a chunk of data you feel is large enough to
be representative--with the pattern U0U*. If the unpack succeeds with
no warnings, you have valid utf8. You could try the same thing with
Encode's 'decode_utf8' routine. See perluniintro for details. in both
cases, though, you need to make sure that you've grabbed well-formed
utf8 from the source file in the first place. If the data cuts off in
the middle of a multi-byte character, you'll get an error.

I have tried verifying the entire string, using the following:

my $result = unpack("U0U*", $content);
print $result;

Well, it gave no errors even though the string was UTF-8 or not, but an interesting thing is that the result printed was always 65279 if the string was UTF-8 and 112 or 116 if the string was not UTF-8.

Do you know what represent these numbers? I am curious why sometimes it prints 112 and sometimes 116 when using some ansi strings. I hope the result is consistent and I can base on it to use the code in my program for checking if a string is UTF-8.

Thank you.

Octavian


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to