I think it's pretty safe to say there is definitely some issues with HTML::Parser and mod_perl, at least when subclassing it.
---
package PackageName;
use HTML::Parser;
@PackageName::ISA = qw(HTML::Parser);
---
I ended up using a somewhat different approach, something like:
package PackageName;
use HTML::Parser;
sub new {
my $SELF_PackageName = bless {}, shift;
$SELF_PackageName->{parser} = HTML::Parser->new( api_version => 3,
start_h => [\&start, "self, tagname, attr, attrseq, text"],
end_h => [\&end, "self, tagname, text" ],
text_h => [\&text, "self, text, is_cdata"]
);
return $SELF_PackageName;
}
sub parse_file { shift->{parser}->parse_file(@_); }
sub start { ... }
sub end { ... }
sub text { ... }
On Sep 22, 2005, at 2:30 PM, Mike Henderson wrote:
> Hello, just a quick question...
>
> Has anyone out there successfully deployed HTML::Parser in an apache
> 1.3.x / mod_perl / HTML::Mason environment (dynamically parsing pages)
> ?
>
> I realize that the module itself is kind of crunky, and additionally
> an XS module, so, i'm left wondering.
>
> Basically, what i'm seeing is everything working as you'd expect on
> the first load of the page which creates and uses an HTML::Parser
> object, but, on any subsequent loads from that same apache child,
> things are partially broken -- specifically, during parsing, callbacks
> to text() don't seem to be happening, but callbacks to start() and
> end() seem to work fine.
>
> I'm wondering if there's any way around this -- that is, any way to
> completely destroy any previous data that HTML::Parser is letting
> linger that's causing a problem, and reloading the module. Not sure
> about the feasiblity of this due it being XS.
I have seen odd behavior using Netscape::Bookmarks (which uses
HTML::Parse to parse the file) under mod_perl 1.3.x and Mason. I
thought it was my code maybe, but what you are saying reminds me that
we got garbage back sometimes from a parse.
Barry Hoggard