On 28/03/2005, Daniel Smith wrote: > I was tasked with parsing a set of .html files in order to extract > the data contained within some terribly formatted tables.
[...] > Can anyone shed some light? I used HTML::Treebuilder on a similar project once: #! /usr/bin/perl use warnings; use strict; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new; $tree->parse_file('yourfile.html') or die "Cannot open file: $!"; # Get tables my @tables = $tree->look_down( '_tag', 'table' ); for my $t (@tables) { # Get rows my @rows = $t->look_down('_tag', 'tr'); for my $r (@rows) { print "Row contents:\n"; # Get 'th' and 'td' cells my @cells = $r->look_down('_tag', qr/(th|td)/); for my $c (@cells) { print "\t", $c->as_text(), "\n"; } } } $tree->delete(); -- felix -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>