On 28/03/2005, Daniel Smith wrote:
> I was tasked with parsing a set of .html files in order to extract
> the data contained within some terribly formatted tables.
[...]
> Can anyone shed some light?
I used HTML::Treebuilder on a similar project once:
#! /usr/bin/perl
use warnings;
use strict;
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new;
$tree->parse_file('yourfile.html') or die "Cannot open file: $!";
# Get tables
my @tables = $tree->look_down( '_tag', 'table' );
for my $t (@tables) {
# Get rows
my @rows = $t->look_down('_tag', 'tr');
for my $r (@rows) {
print "Row contents:\n";
# Get 'th' and 'td' cells
my @cells = $r->look_down('_tag', qr/(th|td)/);
for my $c (@cells) {
print "\t", $c->as_text(), "\n";
}
}
}
$tree->delete();
--
felix
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>