Re: Scraping Data Behind a Form

kc68 Wed, 12 Apr 2006 12:21:26 -0700

On Tue, 11 Apr 2006 18:12:16 -0400, <[EMAIL PROTECTED]> wrote:

I am slowly making my way through the process of scraping the databehind a form and can now get five results plus a series of links usingthe script below. I need help in doing the following: 1) Eliminatingall material on the page other than the list and the links (andultimately eliminate the link numbers); 2) following the links so thatthe five listings behind each link are returned; 3) Returning theresults for all states (i.e. all listings) rather than just Ohio. Frommy tutorial it looks like I need foreach my $link($browser->find_aal_links( url_regex => SOMETHING)){ - and that thesomething is based on the url that appears upon executing a link. Butfrom there I'm stumped. The url of the form is in the script below.Thanks in advance.
Ken

use strict;

use WWW::Mechanize;

my $output_dir = "c:/training/bc/";

my $starting_url =

"http://www.theblackchurchpage.com/modules.php?name=Locator";;

my $browser = WWW::Mechanize->new();

$browser->get( $starting_url );

$browser->form_number( 3 );

$browser->field( "church_state", "OH" );

$browser->submit();

{

open OUT, ">$output_dir/bc7.xls" or die "Can't open file: $!";

print OUT $browser->content;

# close OUT;

}

close PAGE;


print $browser->content;

I haven't seen a response to my questions (above) posted yesterday. Notclear? Not possible? Too hard? Too obvious? To elaborate on the firstquestion: in the tutorial example, the foreach my $link does a regexqr/cd.asp/ based on a url that is

homepage.com/sub/sub1/sub2/cd.asp?I=500031
The url at a link on the page I'm trying to scrape is
http://www.theblackchurchpage.com/modules.php?name=Locator&op=search&pnum=2&ccount=5&offset=5&church_name=&church_state=oh&church_city=&church_denom=&church_pastor=

I don't see how to make a regex from that. Anything on any of thequestions would help.


Ken



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Scraping Data Behind a Form

Reply via email to