--- [EMAIL PROTECTED] wrote: > Hey guys, > > thanks for all the help with this. I actually did mean HTML Links as I am > looking to parse out specific links from an HTML file. I'm not only > concerned with "HTTP" link (<a href>) but also other HTML flags. Right > now I'm using HTML::SimpleLinkExtor but I'm not sure that gives me exactly > what I want. > > Essentially what I'm trying to do is parse out all info from a web page > that is in bold (<b>text</b>). I'm going to revisit LinkExtor but if > there is a better solution, I'm all ears. > > Greg
Greg, I was playing around with a similar problem and subclassed HTML::TokeParser as HTML::TokeParser::Easy. To do what you're looking for, you could use that module and do this: ############################################ #!/usr/bin/perl -w use strict; use HTML::TokeParser::Easy; my $file; { local $/; $file = <DATA>; } # Note: If you pass it a file name instead of the file contents, # pass the name directly and *not* as a reference!!! # see perldoc HTML::TokeParser for more info. my $p = HTML::TokeParser::Easy->new( \$file ); while ( my $token = $p->get_token ) { if ( $p->is_start_tag( $token ) and $p->return_tag( $token ) eq 'b' ) { my $bold_text = ''; $token = $p->get_token; while ( ! ( $p->is_end_tag( $token ) and $p->return_tag( $token ) eq 'b' ) ) { $bold_text .= $p->return_text( $token ); $token = $p->get_token; } print "$bold_text\n"; } } __DATA__ <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <title>Untitled</title> </head> <body> <h1>test</h1> <b>This is the first <i>bold</i> text.</b> <i>This should not appear.</i> <b>This is the second bold text.</b> </body> </html> ############################################ The output from the above is: This is the first <i>bold</i> text. This is the second bold text. To use it, you would have to install HTML::TokeParser and my HTML::TokeParser::Easy module (which I just uploaded at http://www.easystreet.com/~ovid/cgi_course/downloads/Easy.pm). I haven't bothered to create a complete install package for it, so go into one of your Perl lib directories and in an HTML older (something like /usr/bin/perl/site/lib/html/) create a TokeParser directory and place Easy.pm in that directory. Full POD is included so, after you install it, you can type 'perldoc HTML::TokeParser::Easy' to see how to use it. Frankly, I think the module is a bit of a hack, but if it works... Cheers, Curtis "Ovid" Poe ===== Senior Programmer Onsite! Technology (http://www.onsitetech.com/) "Ovid" on http://www.perlmonks.org/ __________________________________________________ Do You Yahoo!? Make a great connection at Yahoo! Personals. http://personals.yahoo.com -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]