Re: Multiple Page Scrape

kc68 Tue, 06 Jun 2006 15:52:15 -0700

Thanks, but complicated for true beginners. First issue was which ofthree choices was XML::Simple - I chose to install XML-Simple-DTD Readerover XML-Simpler or Test-XML-Simple. I later read that XML::Simpleprobably comes with active Perl.

Then I read the FAQ for XML::Simple and found that "Although you can getby without using any options, you shouldn't even consider usingXML::Simple in production until you know what these two options do:forcearray keyattr"


I'm starting to understand hashes, but sample code would help.  Thank you.

Ken

****************

On Tue, 06 Jun 2006 11:58:21 -0400, Anthony Ettinger<[EMAIL PROTECTED]> wrote:

Since it's native xml format, I would use XML::Simple to parse it into
a hash, then you can format however you want by looping through the
hash.

On 6/6/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

The script below scrapes a House of Representatives vote page which isin

xml and saves it in a spreadsheet which is best opened as an xls read
only.  How can I:

1) scrape multiple vote pages into individual spreadsheets with a single
script?

2) Only scrape columns C, F, G, H in the result here? I'd also preferto

have the spreadsheet as a csv, but that doesn't work by just changing
*.xls to *.csv  Thanks in advance.

Ken

#!/bin/perl

use strict;
use warnings;

use WWW::Mechanize;

my $output_dir = "c:/training/bc";

my $starting_url = "http://clerk.house.gov/evs/2005/roll667.xml";;

my $browser = WWW::Mechanize->new();

$browser->get( $starting_url );

foreach my $line (split(/[\n\r]+/, $browser->content)) { print $line;}

open OUT, ">$output_dir/vote667.xls" or die "Can't open file:$!";

foreach my $line (split(/[\n\r]+/, $browser->content)) {

print OUT "$line";}

close OUT;



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Multiple Page Scrape

Reply via email to