Re: Regex...HTML::Parser...Getting webpage data?

Rob Dixon Fri, 04 Aug 2006 07:30:01 -0700

Wesley Bresson wrote:
>
> I'm pretty new to Perl, my past experience has been in modifying other
> peoples code in order to do what I want it to do but now I'm trying to
> write
> my own to do a specific task that I can't find code for and am having
> issues. I am trying to retrieve data from a webpage, say
> http://www.apmex.com/shop/buy/Silver_American_Eagles.asp?orderid=0 for
> example, the price of a 2006 1oz Silver American Eagle in the 20-99 price
> break quantity. Should I use Regex to do that or would I be better off with
> HTML::Parser ? I've attemped Regex since I seem to understand it better but
> haven't had much success it getting it to pull the right price.
> HTML::Parser
> I understand even less than Regex but I've read that its a more reliable
> way
> of pulling webpage data ?  I can't seem to find "easy" to understand
> documentation on it though so I'm even farther away from getting it to work
> then Regex, Any advice ?


Two Web questions in one day! It's hard to know exactly how you're going to your
code Wesley, but the stuff below should be a good starter. It pulls in the web
site and parses it using HTML::TreeBuilder. It looks for all table row <tr>
elements that contain exactly five table data <td> elements, which is all the
item details plus a few stragglers. The real item data has an item number in the
format #9999 in the second <td> element, so ignore everything that's not like
that. Finally the description and price are pulled from the relevant elements,
and the numeric price value extracted with a regex. Everything that falls within
your price bracket is then printed. I didn't restrict it to 2006 stuff as there
weren't any at the time I wrote this, but it's easy to see how to do it I hope.

HTH,

Rob


use strict;
use warnings;

use LWP::Simple;
use HTML::TreeBuilder;

my $html = get 
'http://www.apmex.com/shop/buy/Silver_American_Eagles.asp?orderid=0';

my $tree = HTML::TreeBuilder->new_from_content($html);

my @tr = $tree->find_by_tag_name('tr');

foreach my $tr (@tr) {

  my @td = $tr->find_by_tag_name('td');
  next unless @td == 5;

  my ($number, $desc, $price) = map $_->as_trimmed_text, @td[1, 2, 4];
  next unless $number =~ /#\d+/;

  my ($dollars) = $price =~ /\$([\d\.]+)/;
  next unless $dollars >= 20 and $dollars < 100;

  print $desc, "\n", $price, "\n\n"
}

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Regex...HTML::Parser...Getting webpage data?

Reply via email to