On Thu, 01 Jun 2006 20:14:36 -0400, David Romano <[EMAIL PROTECTED]>
wrote:
Hi kc68,
On 6/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
I'm not getting past printing to the screen and to a file the page in
the
script below but without the list of names in the middle. Without the
if
line I get an endless scroll. I want to be able to pull in all names
and
then isolate and print one (e.g. abercrombie). Guidance and actual
script
appreciated.
I'm not certain of what you're trying to do, but hopefully this helps
you:
#!/bin/perl
use strict;
use warnings; # shows that PAGE isn't used at all
use WWW::Mechanize;
my $output_dir = "c:/training/bc";
my $starting_url = "http://clerk.house.gov/members/olmbr.html";
my $browser = WWW::Mechanize->new();
$browser->get( $starting_url );
$browser->submit();
# Looking at the url, there's no form to submit, so I don't think you
need the line above: go straight
# to fetching the contents using $browser->content. Re-read
WWW::Mechanize documentation.
foreach my $line (split(/[\n\r]+/, $browser->content)) {
if $ line =~ /abercrombie/
print $browser->content;
}
#The above doesn't even compile for me (there's a space between '$'
and 'line', and there's no
# curly brackets to say you want to print $browser->content when $line
matches). Your regular
# expression (looking the page you're scraping) needs the 'i' modifier
so that letter case doesn't
# matter. A great resource is http://perldoc.perl.org/perlretut.html
. perlretut will also show you
# other ways to capture the information you want without having to
use a split to iterate line by
# line.
open OUT, ">$output_dir/simple2.html" or die "Can't open file:$!";
print OUT $browser->content;
close OUT;
# I take it this is for debugging purposes, to make sure the webpage
you scraped was the right
# one?
close PAGE;
# PAGE isn't used anywhere else.
Below is what I think you're basically trying to get done. See if it
works for you:
#!/bin/perl
use strict;
use warnings;
use WWW::Mechanize;
my $output_dir = "c:/training/bc";
my $starting_url = "http://clerk.house.gov/members/olmbr.html";
my $browser = WWW::Mechanize->new();
$browser->get( $starting_url );
print "$_\n" for ($browser->content =~ /(?<=size=2><i>) [^<]+/gx);
# or maybe
# for ($browser->content =~ /(?<=size=2><i>) [^<]+/gx) { print "$_\n"
if /abercrombie/i }
# ?
HTH,
David
***********
The second option worked to print Abercrombie, Neil to the screen. Still
working on basic concepts. The split construction was suggested by
someone as a way to get to pulling in all listings and ultimately all
votes. Can you complete that logic to return all lines with
representatives' names? Among my points of confusion, when is the print
command within braces and when is it outside braces?
Ken
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>