--- Octavian Rasnita <[EMAIL PROTECTED]> wrote: > Hi all, > > I want to get a web page and remove all the HTML tags from it, then save the > visible text only. Like saving the file as text from Internet Explorer. > > Do you know a Perl module that can help me to find and remove all the HTML > tags? > I was thinking to use regular expressions, but I may forget a lot of things.
You can also use HTML::TokeParser::Simple for this: use HTML::TokeParser::Simple; my $p = HTML::TokeParser::Simple->new( $somefile ); my $token; # skip to the body do { $token = $p->get_token; } until ( $token->is_start_tag( 'body' ) ); while ( my $token = $p->get_token ) { next unless $token->is_text; # skip non-visible stuff print $token->return_text; } Cheers, Curtis "Ovid" Poe ===== "Ovid" on http://www.perlmonks.org/ Someone asked me how to count to 10 in Perl: push@A,$_ for reverse q.e...q.n.;for(@A){$_=unpack(q|c|,$_);@a=split//; shift@a;shift@a if $a[$[]eq$[;$_=join q||,@a};print $_,$/for reverse @A __________________________________________________ Do You Yahoo!? Yahoo! Autos - Get free new car price quotes http://autos.yahoo.com -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]