On Wed, Jun 26, 2013 at 7:45 AM, Peter Gordon <pete...@netspace.net.au>wrote:
> On Wed, 26 Jun 2013 12:36:01 +1200, Gregory Machin wrote: > > > >Looks like the data already is utf8, but the header of the XML > >specifies otherwise. > >How do you parse the data? Can you give us a short example file? > > > >Jenda > > This is a bit of code I adapt to whichever encoding I require. > > use open ":encoding(UTF-16le)"; > while( <> ) { > s/\x{FF}\x{FE}|\x{FFFF}//; # Remove BOM. > s/[\x0A\x0D]+$//; # Remove CR LF > > If you can get the data into a text editor which has a "convert" option, > you can use it to either find out the encoding &/or change it to utf8. > If you have a file with mixed encodings, you have my sympathies. > Encode::Guess may occasionally be useful: > use Encode::Guess; my $decoder=Encode::Guess->guess("Grégoire"); die $decoder unless $decoder; print $decoder->name; #---> utf8 -- Charles DeRykus