Hi - I am new to international character encoding and how the various encodings are handled in perl. After a day of reading, I'm asking for help.
I am downloading data from an international (French) web site. The HTTP headers show that the pages I am downloading are encoded in iso-8859-1. Most characters (all the accented letters) are fine, but some (i.e. the trade mark) are incorrect. Here is a working sample script: #!/usr/bin/perl use strict; use warnings FATAL => 'all'; use LWP::Simple; use Encode; binmode STDOUT, ":utf8"; my $content = get( "http://www.formula1.com/race/circuitdetail/773.html" ) or die "get failed.\n"; my( $name ) = $content =~ /<td class="articleTitle">(.+?)<\/td>/s; print "name w/o decode:\n"; print $name, "\n"; my $name1 = decode( 'iso-8859-1', $name ); print "name w/decode:\n"; print $name1, "\n"; $name =~ s/\x{99}/\x{2122}/g; print "name manually converted:\n"; print $name, "\n"; The output is: name w/o decode: FORMULA 1 Gran Premio de España Telefónica 2007 name w/decode: FORMULA 1 Gran Premio de España Telefónica 2007 name manually converted: FORMULA 1™ Gran Premio de España Telefónica 2007 How do I get a proper conversion from iso-8859-1 to perl's internal utf8? Is there a way to ask LWP:: to do this based on the character encoding specified in the HTTP headers? I am using: This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi on Debian unstable: Linux hanako 2.6.18-4-amd64 #1 SMP Mon Mar 26 11:36:53 CEST 2007 x86_64 GNU/Linux -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/