Randy W. Sims wrote:
On 7/21/2004 11:24 PM, Andrew Gaffney wrote:
Randy W. Sims wrote:
On 7/21/2004 10:42 PM, Andrew Gaffney wrote:
I am trying to build a HTML editor for use with my HTML::Mason site. I intend for it to support nested tables, SPANs, and anchors. I am looking for a module that can help me parse existing HTML (custom or generated by my scripts) into a tree structure similar to:
my $html = [ { tag => 'table', id => 'maintable', width => 300, content =>
[ { tag => 'tr', content =>
[
{ tag => 'td', width => 200, content => "some content" },
{ tag => 'td', width => 100, content => "more content" }
]
]
]; # Not tested, but you get the idea
[snip]
I'd rather generate a structure similar to what I have above instead of having a large tree of class objects that takes up more RAM and is probably slower. How would I go about generating a structure such as that above using HTML::Parser?
Parsers like HTML::Parser scan a document and upon encountering certain tokens fire off events. In the case of HTML::Parser, events are fired when encountering a start tag, the text between tags, and at the end tag. If you have an arbitrarily deep document structure like HTML, you can store the structure using a stack:
<SNIP>
Thanks. In the time it took you to put that together, I came up with the following to figure out how HTML::Parser works. I'll use your code to expand upon it.
<SNIP>
Here is my current working code. Please take a look at it and see if there are any obvious (or not so obvious) problems. I thought this would end up being far more difficult.
parsehtml.pl ============ #!/usr/bin/perl
use strict; use warnings;
use HTML::Parser ();
my $htmltree = [ { tag => 'document', content => [] } ]; my $node = $htmltree->[0]->{content}; my @prevnodes = ($htmltree);
sub start { my $tagname = shift; my $attr = shift; my $newnode = {};
$newnode->{tag} = $tagname; foreach my $key(keys %{$attr}) { $newnode->{$key} = $attr->{$key}; } $newnode->{content} = []; push @prevnodes, $node; push @{$node}, $newnode; $node = $newnode->{content}; }
sub end { my $tagname = shift;
$node = pop @prevnodes; }
sub text { my $text = shift;
chomp $text; if($text ne '') { push @{$node}, $text; } }
my $p = HTML::Parser->new( api_version => 3, start_h => [\&start, "tagname, attr"], end_h => [\&end, "tagname"], text_h => [\&text, "dtext"] );
$p->parse_file("test.html");
use Data::Dumper; print Dumper $htmltree;
test.html ========= <table id="maintable" width="300"> <tr> <td width="200">some content</td> <td width="100">more content</td> </tr> </table>
-- Andrew Gaffney Network Administrator Skyline Aeronautics, LLC. 636-357-1548
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>