Wesley Bresson wrote: > > Thanks for your example script using HTML::Treebuilder, however I'm > trying to figure out why it appears to grab some items but not others. > I've removed the $20-100 limitation (I didn't need it, I really just > need to poll one item) but am still missing some of the items. For > example, the most obvious, are the 2 1986-2006 eagle at the top of the > page, the script grabs one but not the other, any idea why ? Does it > have to do with it looking for the 5 td's ?
Hello Wesley. The script fails because the site is an appalling example of HTML and HTML::TreeBuilder cannot parse it successfully. There are many spurious closing tags without matching opening ones, as well as a lot of missing closing tags; the page as a whole simply doesn't hold together. I have managed to establish that the HTML tables containing the pricing information will parse on their own, so I offer this hack to get the information you need. It works by scanning the input and extracting just the pricing tables, then submitting these to HTML::TreeBuilder. It's not pretty but it will probably suffice for what you need. Please buy from these people: they need your money for better Web development staff! Cheers, Rob use strict; use warnings; use LWP::Simple; use HTML::TreeBuilder; my $html = get 'http://www.apmex.com/shop/buy/Silver_American_Eagles.asp?orderid=0'; my @newhtml; my $in_table; foreach (split /\n/, $html) { next if /^\s*<!--.*-->\s*$/; if (m%<table\b%) { $in_table++ if /"pricesTable"/ or $in_table; } if ($in_table) { push @newhtml, $_; $in_table-- if m%</table\b%; } } my $tree = HTML::TreeBuilder->new_from_content(join '', @newhtml); my @table = $tree->look_down(_tag => 'table', id => 'pricesTable'); foreach my $table (@table) { my @content = $table->content_list; foreach my $elem (@content) { print $elem->as_trimmed_text, "\n"; } } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>