Re: Regex...HTML::Parser...Getting webpage data?

Wesley Bresson Sat, 05 Aug 2006 03:27:48 -0700

...

Two Web questions in one day! It's hard to know exactly how you're goingto yourcode Wesley, but the stuff below should be a good starter. It pulls in thewebsite and parses it using HTML::TreeBuilder. It looks for all table row<tr>elements that contain exactly five table data <td> elements, which is alltheitem details plus a few stragglers. The real item data has an item numberin theformat #9999 in the second <td> element, so ignore everything that's notlikethat. Finally the description and price are pulled from the relevantelements,and the numeric price value extracted with a regex. Everything that fallswithinyour price bracket is then printed. I didn't restrict it to 2006 stuff asthereweren't any at the time I wrote this, but it's easy to see how to do it Ihope.
HTH,

Rob


use strict;
use warnings;

use LWP::Simple;
use HTML::TreeBuilder;
my $html = get'http://www.apmex.com/shop/buy/Silver_American_Eagles.asp?orderid=0';
my $tree = HTML::TreeBuilder->new_from_content($html);

my @tr = $tree->find_by_tag_name('tr');

foreach my $tr (@tr) {

  my @td = $tr->find_by_tag_name('td');
  next unless @td == 5;

  my ($number, $desc, $price) = map $_->as_trimmed_text, @td[1, 2, 4];
  next unless $number =~ /#\d+/;

  my ($dollars) = $price =~ /\$([\d\.]+)/;
  next unless $dollars >= 20 and $dollars < 100;

  print $desc, "\n", $price, "\n\n"
}

Thanks for your example script using HTML::Treebuilder, however I'm tryingto figure out why it appears to grab some items but not others. I've removedthe $20-100 limitation (I didn't need it, I really just need to poll oneitem) but am still missing some of the items. For example, the most obvious,are the 2 1986-2006 eagle at the top of the page, the script grabs one butnot the other, any idea why ? Does it have to do with it looking for the 5td's ?




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Regex...HTML::Parser...Getting webpage data?

Reply via email to