John W. Krahn wrote: > According to HTML::Entities > > # Some extra Latin 1 chars that are listed in the HTML3.2 draft > (21-May-96) > copy => '©', # copyright sign > reg => '®', # registered sign > nbsp => "\240", # non breaking space
Thanks, John, I had no idea where to look. I didn't know a non-breaking space was an actual character, I thought it was just a directive to the browser. I have corrected the code below accordingly and it prints "line 1line 3" as desired. use strict; use warnings; use HTML::TokeParser; my $p = HTML::TokeParser->new(*DATA) or die "Can't open: $!"; while (my $tag = $p->get_tag()) { if ($tag->[0] eq "dd") { my $text = $p->get_trimmed_text(); $text =~ s/^[\s\240]*(.*?)[\s\240]*$/$1/; print "$text"; } } __DATA__ <DD>line 1</DD> <DD> </DD> <DD>line 3</DD> -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]