On Friday 07 April 2006 13:15, [EMAIL PROTECTED] wrote: > I'm trying to learn web scraping and am stopped at the basic point of > scraping a portion > of a web page. I'm able to scrape a full page and save it as *.xml or > *.htm, and I think > I understand regex, but the following fails: > > > ************** > # Prints a portion of a red cross web page to a new htm file. > > use strict; > > use warnings; > > use LWP::Simple; > > use WWW::Mechanize; > > my $url = > > 'http://www.redcrossnca.org/ServiceCenters/montgomery.php3'; > > getstore( $url, 'c://redcross.htm' ); > > open PAGE, 'c://redcross.htm'; > while( my $line = <PAGE> ) { > $line =~ /Health and Safety Classes/ > print "$1\n"; > } > > close PAGE; > ******** > > Once I get the syntax straight I'll go after more detailed scrapes. > > Ken
Have you looked into HTML::TokeParser. It might help you in your web scraping needs. You can read a great article by Chris Ball at: http://www.perl.com/pub/a/2003/01/22/mechanize.html -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>