--- Octavian Rasnita <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I want to get a web page and remove all the HTML tags from it, then save the
> visible text only. Like saving the file as text from Internet Explorer.
>
> Do you know a Perl module that can help me to find and remove all the HTML
> tags?
> I was thinking to use regular expressions, but I may forget a lot of things.
You can also use HTML::TokeParser::Simple for this:
use HTML::TokeParser::Simple;
my $p = HTML::TokeParser::Simple->new( $somefile );
my $token;
# skip to the body
do {
$token = $p->get_token;
} until ( $token->is_start_tag( 'body' ) );
while ( my $token = $p->get_token ) {
next unless $token->is_text; # skip non-visible stuff
print $token->return_text;
}
Cheers,
Curtis "Ovid" Poe
=====
"Ovid" on http://www.perlmonks.org/
Someone asked me how to count to 10 in Perl:
push@A,$_ for reverse q.e...q.n.;for(@A){$_=unpack(q|c|,$_);@a=split//;
shift@a;shift@a if $a[$[]eq$[;$_=join q||,@a};print $_,$/for reverse @A
__________________________________________________
Do You Yahoo!?
Yahoo! Autos - Get free new car price quotes
http://autos.yahoo.com
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]