On Tue, 11 Apr 2006 18:12:16 -0400, <[EMAIL PROTECTED]> wrote:

I am slowly making my way through the process of scraping the data behind a form and can now get five results plus a series of links using the script below. I need help in doing the following: 1) Eliminating all material on the page other than the list and the links (and ultimately eliminate the link numbers); 2) following the links so that the five listings behind each link are returned; 3) Returning the results for all states (i.e. all listings) rather than just Ohio. From my tutorial it looks like I need foreach my $link ($browser->find_aal_links( url_regex => SOMETHING)){ - and that the something is based on the url that appears upon executing a link. But from there I'm stumped. The url of the form is in the script below. Thanks in advance.

Ken

use strict;

use WWW::Mechanize;

my $output_dir = "c:/training/bc/";

my $starting_url =

"http://www.theblackchurchpage.com/modules.php?name=Locator";;

my $browser = WWW::Mechanize->new();

$browser->get( $starting_url );

$browser->form_number( 3 );

$browser->field( "church_state", "OH" );

$browser->submit();

{

open OUT, ">$output_dir/bc7.xls" or die "Can't open file: $!";

print OUT $browser->content;

# close OUT;

}

close PAGE;


print $browser->content;


I haven't seen a response to my questions (above) posted yesterday. Not clear? Not possible? Too hard? Too obvious? To elaborate on the first question: in the tutorial example, the foreach my $link does a regex qr/cd.asp/ based on a url that is
homepage.com/sub/sub1/sub2/cd.asp?I=500031
The url at a link on the page I'm trying to scrape is
http://www.theblackchurchpage.com/modules.php?name=Locator&op=search&pnum=2&ccount=5&offset=5&church_name=&church_state=oh&church_city=&church_denom=&church_pastor=
I don't see how to make a regex from that. Anything on any of the questions would help.

Ken



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to