RE: xml problem

Grant McLean Wed, 20 Jun 2001 03:00:25 -0700
From: Nigel Wetters [mailto:[EMAIL PROTECTED]]
> I don't know about how XML::Parser handles memory - last time 
> I tried to use it to parse content.rdf from http://dmoz.org , 
> it soaked up all my memory, then bombed. Sometimes, you need 
> to write your own parsing subs :)

A casual reader could take that to imply that XML::Parser has
memory leaks.  If you have a test case which demonstrates 
XML::Parser leaking memory then please forward it to the
maintainer (Clark Cooper).  My experience of XML::Parser is 
that it is extremely solid and much faster than regexes for 
serious parsing.

Is the file you referred to a really big file?  If so, any 
parsing module that produced a tree (eg: XML::Parser in Tree 
style, XML::DOM, XML::Simple etc) would require memory to the 
tune of 'n' times the number of bytes in the original file 
(where 'n' is probably a bigger number than you'd guess).

When dealing with large XML files, an event based parser (eg:
XML::Parser in native mode, the SAX modules) or a hybrid event/
tree module like XML::Twig is best.

But I can only echo what other people have said about parsing
XML with regexes - just don't do it!  There are plenty of things
that might exist in an XML document that are really hard to cope
with using regexes (eg: encoding, entity definitions, entity 
expansion, CDATA sections etc).  If your regexes work with simple 
XML, you'll come to rely on them and then they'll break because 
you give them a document with a euro symbol.  So you fix that and 
they break when you get a document with an encoding declaration 
you weren't prepared for. And on it goes.  I can heartily recommend 
using XML parsers for parsing XML.

Notwithstanding that rant (sorry, I'm over it now), as the author
of XML::Simple, I *can't* recommend it for solving the question
originally asked.  The sample XML used mixed content (elements
containing both text and nested elements) which XML::Simple does
not (and will not) support.

The two modules I'd be inclined to recommend for the original
problem are XML::XPath or XML::Twig.  The former is more standards
based and the latter is more Perlish.  I'd have to see some of the
actual XML and the required output before I went any further.

Regards
Grant
RE: xml problem

Reply via email to