Re: HTML::TokeParser and

John W. Krahn Tue, 28 Jan 2003 15:19:17 -0800

David Eason wrote:
> 
> I recreated a problem in my program in a small code sample. The code below
> is giving me the following output at the console and I have no idea why:
> 
> Output:
> line 1áline 3
> 
> I am seeing a lower case 'a' with an acute accent between 'line 1' and 'line
> 3'. Any idea what is going on?


According to HTML::Entities

 # Some extra Latin 1 chars that are listed in the HTML3.2 draft
(21-May-96)
 copy   => 'Š',  # copyright sign
 reg    => 'Ž',  # registered sign
 nbsp   => "\240", # non breaking space


> use strict;
> use warnings;
> use HTML::TokeParser;
> 
> my $p = HTML::TokeParser->new(*DATA) or die "Can't open: $!";
> while (my $tag = $p->get_tag())
> {
>     print $p->get_trimmed_text() if ($tag->[0] eq "dd")

    if ( $tag->[0] eq 'dd' ) {
        ( my $text = $p->get_trimmed_text() ) =~ tr/\240/ /;
        print $text;
        }

> }
> 
> __END__
> __DATA__
> <DD>line 1</DD>
> <DD>&nbsp;</DD>
> <DD>line 3</DD>

Using both __END__ and __DATA__ is redundant as the DATA filehandle will
read from either (unless you have require'd this from another program.)

perldoc perldata


John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: HTML::TokeParser and

Reply via email to