Re: read source file of .html

Briac Pilpré Mon, 14 Jan 2002 23:33:21 -0800

Gary Hawkins wrote:
> Along that line, I would like to be able to wind up with pages after retrieval
> as plain text without html tags, hopefully using a module.


Here's a really quick way to do so using HTML::Parser, it can probably
use some tweaking.

Hope this helps,
Briac

#!/usr/bin/perl -w
use strict;
use HTML::Parser 3;
use LWP::Simple;

my $html = get("http://www.mit.edu";) or die "Couldn't fetch the page";

my $parser = HTML::Parser->new(
        unbroken_text   => 1,
        ignore_elements => [qw( script head )],
        text_h  => [ sub {print shift}, 'dtext']
)->parse($html)->eof();

__END__

-- 
briac
        A flying lark. Five 
        trout swim in the pond. Four foxes 
        under a she-oak.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: read source file of .html

Reply via email to