On 1/26/20 9:58 AM, Lars Noodén wrote: > I've got a long script that has "use utf8;" near the top. The script > parses some HTML and then I run into trouble when printing the result as > shown below: > > use utf8; > use HTML::TreeBuilder::XPath; > . . . > my $xhtml = HTML::TreeBuilder::XPath->new; > $xhtml->implicit_tags(1); > $xhtml->no_space_compacting(1); > $xhtml->parse_file($file) > or die("Could not parse '$file' : $!\n"); > . . . > print $html->as_XML_indented; > . . . > > The exact error is: > > Wide character in print at ~/bin/script.pl line 147. > > It does not object to 99% of the material I've run it over daily for > months but something, somewhere in a recent file is causing the wide > character problem. It's also causing it to mangle the UTF-8 parts. > > How do I get the HTML::TreeBuilder::XPath module to use UTF-8 all the > way through? > > /Lars >
It was pointed out off-list that I had missed the observation that the file being parsed can be forced into a UTF-8 interpretation: > my $filehandle; > open ($filehandle, "< :encoding(UTF-8)", "htmlfile.html") || die "$1: error: $!"; > $xhtml->parse_file($filehandle); That, I am very grateful to report, solved that question. I guess the scope of "use utf8;" is more narrow than I had thought. Thanks to all who took time to contemplate the problem. /Lars -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/