On Wed, Jun 26, 2013 at 7:45 AM, Peter Gordon <pete...@netspace.net.au>wrote:

> On Wed, 26 Jun 2013 12:36:01 +1200, Gregory Machin wrote:
> >
> >Looks like the data already is utf8, but the header of the XML
> >specifies otherwise.
> >How do you parse the data? Can you give us a short example file?
> >
> >Jenda
>


> This is a bit of code I adapt to whichever encoding I require.
>
> use open ":encoding(UTF-16le)";
> while( <> ) {
>         s/\x{FF}\x{FE}|\x{FFFF}//;              # Remove BOM.
>         s/[\x0A\x0D]+$//;                       # Remove CR LF
>
> If you can get the data into a text editor which has a "convert" option,
> you can use it to either find out the encoding &/or change it to utf8.
> If you have a file with mixed encodings, you have my sympathies.
>


Encode::Guess may occasionally be useful:

>
 use Encode::Guess;

 my $decoder=Encode::Guess->guess("Grégoire");
 die $decoder unless $decoder;
 print $decoder->name;    #---> utf8

-- 
Charles DeRykus

Reply via email to