Parag Kalra:
I am curious to know more on UTF and understand related issues that may
creep in my algorithm. Could someone please shed some light on it.

Can I use following:

use Encode;

while(<$sample_file_fh>){

    # Encoding into utf data
    $utf_data = encode("utf8", $_);


For the line above, I may think it's not right.
What you got from <$sample_file_fh> is maybe different encoding chunk, for example,iso-8859-1,gb2312 or UTF-8 etc. You want to translate them to Perl's internal utf8 format firstly,which includes a utf8 flag and the data part.After translation,utf8 flag should be on and the data part is the chunk with utf8 encoding.You do it with the decode() function from Encode module:

my $internal_utf8 = decode("gb2312",$_); # given the data was gb2312 encoding originally

After that,you could translate the $internal_utf8 to any encoding string you want, use the encode() function from Encode module as well:

my $output = encode("utf8",$internal_utf8); # output with UTF-8 encoding


HTH.


    $data_string = $data_string.$utf_data;
}


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to