Re: web::scraper xpath

C.DeRykus Mon, 13 Dec 2010 16:50:34 -0800

On Dec 9, 10:00 am, ag4ve...@gmail.com (shawn wilson) wrote:
> i decided to use another module to get my data but, i'm having a bit
> of an issue with xpath.
>
> the data i want looks like this:
>
> <table class="someclass" style="width:508px;" id="Any_20">
>  <tbody>
>   <tr>
>    <td>name</td>
>    <td>attribute</td>
>
>    <td>name2</td>
>    <td>attribute2</td>
>
>    <td>possible name3</td>
>    <td>possible attribute3</td>
>
>    <td>
> ....
>    </tr><tr>
> more of the same format
>
> with this code, i'm only getting the first line of data (ie, <td> ...
> </td>). i realize that i'm only getting the first and second td which
> is fine, but how do i get multiple rows? i'm also grabbing the html
> from a file so that i don't needlessly keep hitting up their web
> server.
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> use LWP::UserAgent;
> use LWP::Simple;
> use Web::Scraper;
> use Data::Dumper::Simple;
>
> my( $infile ) = $ARGV[ 0 ] =~ m/^([\ A-Z0-9_.-]+)$/ig;
>
> my $pagedata = scraper {
>    process '//*/tab...@class="someclass"]', 'table[]' => scraper {
>       process '//tr/td[1]', 'name' => 'TEXT';
>       process '//tr/td[2]', 'attr' => 'TEXT';
>    };
>
> };
>
> open( FILE, "< $infile" );
>
> my $content = do { local $/; <FILE> };
>
>    my $res = $pagedata->scrape( $content )
>       or die "Can't define content to parser $!";
>
> print Dumper( $res );


I don't get XML::Scraper but, alternatively with XML::LibXML,
a possible way:

use XML::XPath;
use XML::LibXML;

my $parser = XML::LibXML->new;
my $content = $parser->parse_file( $infile);

my @nodes =
   $content->findnodes("//tabl...@class='someclass']/tbody/tr" );

foreach my $node ( @nodes ) {
     print XML::XPath::XMLParser::as_string($node);
}

output:

     <tr>
         <td>name</td>
         <td>attribute</td>

         <td>name2</td>
         <td>attribute2</td>

         <td>possible name3</td>
         <td>possible attribute3</td>

         </tr>

--
Charles DeRykus


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: web::scraper xpath

Reply via email to