Hi,

Jan Eden wrote on 18.03.2005:

>Hi,
>
>I have a bunch of files in the iso-8859-1 text encoding which I want
>to save (in an edited form) as UTF-8.
>
>I use the following line:
>
>use open IN => ':encoding(iso-8859-1)', OUT => ':utf8';
>
>and it does not work.
>
>This is strange, as I use this pragma all the time, and it always
>worked.
>
>When I open the original files, they are Latin-1 encoded. When I
>comment the line above, the output files are also Latin-1 encoded.

This is driving me nuts. I built a minimal example now:
___

#!/usr/bin/perl -w

use strict;
use HTML::Entities;

use open IN => ':encoding(iso-8859-1)', OUT => ':utf8';

# This file is ISO-8859-1 encoded!
my $filename = "input.htm";

open PAGE, $filename or die "Cannot open $filename";
my $content = join '', <PAGE>;
close PAGE or die "Cannot close $filename";
return unless $content;

$content = decode_entities($content);

my $newfile = "test2.html";     
open FILE, "> $newfile" or die "Cannot open $newfile";
print $content;
print FILE $content;
close FILE or die "Cannot close $newfile";
___

I call the script like this

./test.pl > test.html

But only test.html contains a valid UTF-8 text, test2.html has garbled 
non-ASCII characters.

Really confusing, especially since I was somehow able to create several hundred 
correctly formatted files with my script earlier. I have no idea what changed, 
and how I was able to output UTF-8 files earlier.

In the end, I will use a DBI method to store the content of my input files in a 
database (which still works, I just tested), but I am curious why

print $content;

would do something other than

print FILE $content;

Can someone shed a light on this?

Thanks,

Jan
-- 
I'd never join any club that would have the likes of me as a member. - Groucho 
Marx

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to