I think it's pretty safe to say there is definitely some issues with HTML::Parser and mod_perl, at least when subclassing it.
 
I managed to kludge around the problem by not doing that -- ie not doing:

 ---
 package PackageName;

 use HTML::Parser;

 @PackageName::ISA = qw(HTML::Parser);
 ---


I ended up using a somewhat different approach, something like:
 
---

 package PackageName;

 use HTML::Parser;

 sub new {
  my $SELF_PackageName = bless {}, shift;
  $SELF_PackageName->{parser} = HTML::Parser->new( api_version => 3,
                                                   start_h => [\&start, "self, tagname, attr, attrseq, text"],
                                                   end_h   => [\&end, "self, tagname, text" ],
                                                   text_h  => [\&text, "self, text, is_cdata"]
                                                 );
  return $SELF_PackageName;
 }

 sub parse_file { shift->{parser}->parse_file(@_); }

 sub start { ... }
 sub end  { ... }
 sub text { ... }

---
 
It got a bit weird after that, as the HTML::Parser callbacks pass the instance of the actual HTML::Parser object back to the PackageName routines, and I actually end up storing all of 
my data in the HTML::Parser namespace ... but it works! :) ... and this is why we love perl.
 
Thanks guys.
 
On 9/22/05, Barry Hoggard <[EMAIL PROTECTED]> wrote:
On Sep 22, 2005, at 2:30 PM, Mike Henderson wrote:

> Hello, just a quick question...
>
> Has anyone out there successfully deployed HTML::Parser in an apache
> 1.3.x / mod_perl / HTML::Mason environment (dynamically parsing pages)
> ?
>
> I realize that the module itself is kind of crunky, and additionally
> an XS module, so, i'm left wondering.
>
> Basically, what i'm seeing is everything working as you'd expect on
> the first load of the page which creates and uses an HTML::Parser
> object, but, on any subsequent loads from that same apache child,
> things are partially broken -- specifically, during parsing, callbacks
> to text() don't seem to be happening, but callbacks to start() and
> end() seem to work fine.
>
> I'm wondering if there's any way around this -- that is, any way to
> completely destroy any previous data that HTML::Parser is letting
> linger that's causing a problem, and reloading the module. Not sure
> about the feasiblity of this due it being XS.


I have seen odd behavior using Netscape::Bookmarks (which uses
HTML::Parse to parse the file) under mod_perl 1.3.x and Mason.  I
thought it was my code maybe, but what you are saying reminds me that
we got garbage back sometimes from a parse.


Barry Hoggard



Reply via email to