On 7/21/2004 11:24 PM, Andrew Gaffney wrote:
Randy W. Sims wrote:
On 7/21/2004 10:42 PM, Andrew Gaffney wrote:
I am trying to build a HTML editor for use with my HTML::Mason site. I intend for it to support nested tables, SPANs, and anchors. I am looking for a module that can help me parse existing HTML (custom or generated by my scripts) into a tree structure similar to:
my $html = [ { tag => 'table', id => 'maintable', width => 300, content =>
[ { tag => 'tr', content =>
[
{ tag => 'td', width => 200, content => "some content" },
{ tag => 'td', width => 100, content => "more content" }
]
]
]; # Not tested, but you get the idea
[snip]
I'd rather generate a structure similar to what I have above instead of having a large tree of class objects that takes up more RAM and is probably slower. How would I go about generating a structure such as that above using HTML::Parser?
Parsers like HTML::Parser scan a document and upon encountering certain tokens fire off events. In the case of HTML::Parser, events are fired when encountering a start tag, the text between tags, and at the end tag. If you have an arbitrarily deep document structure like HTML, you can store the structure using a stack:
#!/usr/bin/perl package SampleParser;
use strict;
use HTML::Parser; use base qw(HTML::Parser);
sub start { my($self, $tagname, $attr, $attrseq, $origtext) = @_; my $stack = $self->{_stack}; my $depth = $stack ? @$stack : 0; print ' ' x $depth, "<$tagname>\n"; push @{$self->{_stack}}, ' '; }
sub end { my($self, $tagname, $origtext) = @_; pop @{$self->{_stack}}; my $stack = $self->{_stack}; my $depth = $stack ? @$stack : 0; print ' ' x $depth, "<\\$tagname>\n"; }
1;
package main;
use strict; use warnings;
my $p = SampleParser->new(); $p->parse_file(\*DATA);
__DATA__ <html> <head> <title>Title</title> <body> The body. </body> </html>
Thanks. In the time it took you to put that together, I came up with the following to figure out how HTML::Parser works. I'll use your code to expand upon it.
#!/usr/bin/perl
use strict; use warnings;
use HTML::Parser ();
sub start { print "start "; foreach my $arg (@_) { if(ref($arg) eq 'HASH') { foreach my $key(keys %{$arg}) { print " $key - $arg->{$key}\n"; } } else { print "$arg\n"; } } }
sub end { print "end "; foreach(@_) { print "$_\n"; } }
sub text { my $text = shift;
chomp $text; print " text - '$text'\n" if($text ne ''); }
my $p = HTML::Parser->new( api_version => 3, start_h => [\&start, "tagname, attr"], end_h => [\&end, "tagname"], text_h => [\&text, "dtext"], marked_sections => 1 ); # Not sure what this does
$p->parse_file("test.html");
The above gives me the expected output for the sample HTML I provided before.
-- Andrew Gaffney Network Administrator Skyline Aeronautics, LLC. 636-357-1548
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>