RE: Parsing HTML

Charles K. Clarkson Mon, 29 Aug 2005 11:03:40 -0700

Scott Taylor <mailto:[EMAIL PROTECTED]> wrote:


: Is there a better, maybe more eligant, way to do this?  I don't
: mind to use HTML::Parser if I could only figure out how.

use HTML::TokeParser;

my $html = q(

    This is a line of HTML:people write strange things here<br>
    and hardly ever follow proper<p>
    syntax A&amp;B suck at spelling as well<br>
    So I need to clean it up and strip out all<br>

    words less then 3 characters in length.<p>

    Later the words will go into an indexer for<br>
    searching a database

);

my $p = HTML::TokeParser->new( \$html );

while (my $token = $p->get_token) {
    my $string = $p->get_trimmed_text;
    $string = "\n$string" if $token->[1] eq 'br';
    $string = "\n$string" if $token->[1] eq 'p';
    print $string;
}

__END__

HTH,

Charles K. Clarkson
-- 
Mobile Homes Specialist
254 968-8328


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

RE: Parsing HTML

Reply via email to