Randy W. Sims wrote:
On 7/21/2004 11:24 PM, Andrew Gaffney wrote:

Randy W. Sims wrote:

On 7/21/2004 10:42 PM, Andrew Gaffney wrote:

I am trying to build a HTML editor for use with my HTML::Mason site. I intend for it to support nested tables, SPANs, and anchors. I am looking for a module that can help me parse existing HTML (custom or generated by my scripts) into a tree structure similar to:

my $html = [ { tag => 'table', id => 'maintable', width => 300, content =>
[ { tag => 'tr', content =>
[
{ tag => 'td', width => 200, content => "some content" },
{ tag => 'td', width => 100, content => "more content" }
]
]
]; # Not tested, but you get the idea



[snip]

I'd rather generate a structure similar to what I have above instead of having a large tree of class objects that takes up more RAM and is probably slower. How would I go about generating a structure such as that above using HTML::Parser?


Parsers like HTML::Parser scan a document and upon encountering certain tokens fire off events. In the case of HTML::Parser, events are fired when encountering a start tag, the text between tags, and at the end tag. If you have an arbitrarily deep document structure like HTML, you can store the structure using a stack:

#!/usr/bin/perl
package SampleParser;

use strict;

use HTML::Parser;
use base qw(HTML::Parser);

sub start {
    my($self, $tagname, $attr, $attrseq, $origtext) = @_;
    my $stack = $self->{_stack};
    my $depth = $stack ? @$stack : 0;
    print ' ' x $depth, "<$tagname>\n";
    push @{$self->{_stack}}, ' ';
}

sub end {
    my($self, $tagname, $origtext) = @_;
    pop @{$self->{_stack}};
    my $stack = $self->{_stack};
    my $depth = $stack ? @$stack : 0;
    print ' ' x $depth, "<\\$tagname>\n";
}

1;

package main;

use strict;
use warnings;

my $p = SampleParser->new();
$p->parse_file(\*DATA);

__DATA__
<html>
<head>
<title>Title</title>
<body>
The body.
</body>
</html>

Thanks. In the time it took you to put that together, I came up with the following to figure out how HTML::Parser works. I'll use your code to expand upon it.


#!/usr/bin/perl

use strict;
use warnings;

use HTML::Parser ();

sub start {
  print "start ";
  foreach my $arg (@_) {
    if(ref($arg) eq 'HASH') {
      foreach my $key(keys %{$arg}) {
        print "  $key - $arg->{$key}\n";
      }
    } else {
      print "$arg\n";
    }
  }
}

sub end {
  print "end ";
  foreach(@_) {
    print "$_\n";
  }
}

sub text {
  my $text = shift;

  chomp $text;
  print "  text - '$text'\n" if($text ne '');
}

my $p = HTML::Parser->new( api_version => 3,
                           start_h => [\&start, "tagname, attr"],
                           end_h   => [\&end,   "tagname"],
                           text_h  => [\&text,  "dtext"],
                           marked_sections => 1 ); # Not sure what this does

$p->parse_file("test.html");

The above gives me the expected output for the sample HTML I provided before.

--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548


-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>




Reply via email to