John W. Krahn wrote:
> According to HTML::Entities
>
>  # Some extra Latin 1 chars that are listed in the HTML3.2 draft
> (21-May-96)
>  copy   => '©',  # copyright sign
>  reg    => '®',  # registered sign
>  nbsp   => "\240", # non breaking space

Thanks, John, I had no idea where to look. I didn't know a non-breaking
space was an actual character, I thought it was just a directive to the
browser.  I have corrected the code below accordingly and it prints "line
1line 3" as desired.

use strict;
use warnings;
use HTML::TokeParser;

my $p = HTML::TokeParser->new(*DATA) or die "Can't open: $!";
while (my $tag = $p->get_tag())
{
    if ($tag->[0] eq "dd")
    {
        my $text = $p->get_trimmed_text();
        $text =~ s/^[\s\240]*(.*?)[\s\240]*$/$1/;
        print "$text";
    }
}

__DATA__

<DD>line 1</DD>
<DD>&nbsp;</DD>
<DD>line 3</DD>



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to