Re: HTML tags - module

Ovid Wed, 17 Jul 2002 10:51:51 -0700

--- Octavian Rasnita <[EMAIL PROTECTED]> wrote:
> Hi all,
> 
> I want to get a web page and remove all the HTML tags from it, then save the
> visible text only. Like saving the file as text from Internet Explorer.
> 
> Do you know a Perl module that can help me to find and remove all the HTML
> tags?
> I was thinking to use regular expressions, but I may forget a lot of things.


You can also use HTML::TokeParser::Simple for this:

  use HTML::TokeParser::Simple;
  my $p = HTML::TokeParser::Simple->new( $somefile );
  my $token;

  # skip to the body
  do {
      $token = $p->get_token;
  } until ( $token->is_start_tag( 'body' ) );

  while ( my $token = $p->get_token ) {
      next unless $token->is_text; # skip non-visible stuff
      print $token->return_text;
  } 

Cheers,
Curtis "Ovid" Poe

=====
"Ovid" on http://www.perlmonks.org/
Someone asked me how to count to 10 in Perl:
push@A,$_ for reverse q.e...q.n.;for(@A){$_=unpack(q|c|,$_);@a=split//;
shift@a;shift@a if $a[$[]eq$[;$_=join q||,@a};print $_,$/for reverse @A

__________________________________________________
Do You Yahoo!?
Yahoo! Autos - Get free new car price quotes
http://autos.yahoo.com

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: HTML tags - module

Reply via email to