On 1/26/20 9:58 AM, Lars Noodén wrote:
> I've got a long script that has "use utf8;" near the top.  The script
> parses some HTML and then I run into trouble when printing the result as
> shown below:
> 
>     use utf8;
>     use HTML::TreeBuilder::XPath;
>     . . .
>     my $xhtml = HTML::TreeBuilder::XPath->new;
>     $xhtml->implicit_tags(1);
>     $xhtml->no_space_compacting(1);
>     $xhtml->parse_file($file)
>         or die("Could not parse '$file' : $!\n");
>     . . .
>     print $html->as_XML_indented;
>     . . .
> 
> The exact error is:
> 
>     Wide character in print at ~/bin/script.pl line 147.
> 
> It does not object to 99% of the material I've run it over daily for
> months but something, somewhere in a recent file is causing the wide
> character problem.  It's also causing it to mangle the UTF-8 parts.
> 
> How do I get the HTML::TreeBuilder::XPath module to use UTF-8 all the
> way through?
> 
> /Lars
> 

It was pointed out off-list that I had missed the observation that the
file being parsed can be forced into a UTF-8 interpretation:

> my $filehandle;
> open ($filehandle, "< :encoding(UTF-8)", "htmlfile.html")  || die "$1:
error: $!";
> $xhtml->parse_file($filehandle);

That, I am very grateful to report, solved that question.  I guess the
scope of "use utf8;" is more narrow than I had thought.  Thanks to all
who took time to contemplate the problem.

/Lars

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to