On Thu, 01 Jun 2006 20:14:36 -0400, David Romano <[EMAIL PROTECTED]> wrote:

Hi kc68,

On 6/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
I'm not getting past printing to the screen and to a file the page in the script below but without the list of names in the middle. Without the if line I get an endless scroll. I want to be able to pull in all names and then isolate and print one (e.g. abercrombie). Guidance and actual script
appreciated.
I'm not certain of what you're trying to do, but hopefully this helps you:
#!/bin/perl

use strict;
use warnings; # shows that PAGE isn't used at all

use WWW::Mechanize;

my $output_dir = "c:/training/bc";

my $starting_url = "http://clerk.house.gov/members/olmbr.html";;

my $browser = WWW::Mechanize->new();

$browser->get( $starting_url );

$browser->submit();
# Looking at the url, there's no form to submit, so I don't think you
need the line above: go straight
# to fetching the contents using $browser->content. Re-read
WWW::Mechanize documentation.

foreach my $line (split(/[\n\r]+/, $browser->content)) {

        if $ line =~ /abercrombie/

        print $browser->content;

}
#The above doesn't even compile for me (there's a space between '$'
and 'line', and there's no
# curly brackets to say you want to print $browser->content when $line
matches). Your regular
# expression (looking the page you're scraping) needs the 'i' modifier
so that letter case doesn't
# matter.  A great resource is http://perldoc.perl.org/perlretut.html
. perlretut will also show you
# other ways  to capture the information you want without having to
use a split to iterate line by
# line.

  open OUT, ">$output_dir/simple2.html" or die "Can't open file:$!";

  print OUT $browser->content;

  close OUT;
# I take it this is for debugging purposes, to make sure the webpage
you scraped was the right
# one?

close PAGE;
# PAGE isn't used anywhere else.

Below is what I think you're basically trying to get done. See if it
works for you:
#!/bin/perl

use strict;
use warnings;

use WWW::Mechanize;

my $output_dir = "c:/training/bc";

my $starting_url = "http://clerk.house.gov/members/olmbr.html";;

my $browser = WWW::Mechanize->new();

$browser->get( $starting_url );

print "$_\n" for ($browser->content =~ /(?<=size=2><i>) [^<]+/gx);
# or maybe
#  for ($browser->content =~ /(?<=size=2><i>) [^<]+/gx) { print "$_\n"
if /abercrombie/i }
# ?

HTH,
David

***********

The second option worked to print Abercrombie, Neil to the screen. Still working on basic concepts. The split construction was suggested by someone as a way to get to pulling in all listings and ultimately all votes. Can you complete that logic to return all lines with representatives' names? Among my points of confusion, when is the print command within braces and when is it outside braces?

Ken



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to