Re: XML::Reader

Klaus Tue, 11 May 2010 06:59:56 -0700

On Tue, May 11, 2010 at 7:49 AM, Jonathan Rockway <j...@jrock.us> wrote:

> * On Tue, Apr 27 2010, Klaus wrote:
> > I have released XML::Reader (ver 0.34)
> > http://search.cpan.org/~keichner/XML-Reader-0.34/lib/XML/Reader.pm<http://search.cpan.org/%7Ekeichner/XML-Reader-0.34/lib/XML/Reader.pm>
>

by the way, I have now released a new version of XML::Reader (ver 0.35)
with some bug fixes, warts removed, relicensing, etc...
http://search.cpan.org/~keichner/XML-Reader-0.35/lib/XML/Reader.pm<http://www.google.com/url?sa=D&q=http://search.cpan.org/%7Ekeichner/XML-Reader-0.35/lib/XML/Reader.pm&usg=AFQjCNFMDvw04s1jwrzMvJCddJWgkjfcJg>

> > To explain the module, I have created a small demonstration program
> > that extracts XML-subtrees (for example any path that ends with '/.../
> > a') memory efficiently.
> >
> > An XML document can be very large (possibly many gigabytes), but is
> > composed of XML-subtrees, each of which is only a few kilobytes in
> > size. The demonstration program reads XML-subtrees one by one, only
> > the memory for one subtree is held at a time. Each subtree can then be
> > processed further at your convenience (for example by using regular
> > expressions, or, by using other XML-Modules, such as XML::Simple). In
> > principle, XML::Reader has no event driven callback functions, you
> > have to loop over the XML-document yourself and the resulting XML-
> > subtree is represented in text format.
>
> So apparently I am rather behind on module-authors, but I just thought
> I'd ask if you've taken a look at XML::Twig?  That seems to be the main
> module for this sort of thing, and seems to have an established
> userbase.  Maybe patches to that would be more productive than
> reinventing the wheel?

Thanks for your message.

I would position XML::Reader in the same space as XML::Twig and
XML::TokeParser.

I have taken a look at XML::Twig which has an established userbase
and I agree in that XML::Reader duplicates many of the functionalities
already provided by XML::Twig.

However, unlike XML::Twig, XML::Reader does not rely on callback
functions to parse the XML. With XML::Reader you loop over the
XML-document yourself and the resulting XML-elements (and/or
XML-subtrees) are represented in text format. This style of processing
XML is similar to the classic pattern:

"open my $fh, '<', 'file.txt'; while (<$fh>) { do_sth($_); } close $fh;"

This pattern is also implemented by XML::TokeParser. However,
unlike XML::TokeParser, XML::Reader records the full XML path as
it processes the XML-document, therefore it can target not only
specific tags, but it can also target a full path of nested element
tags (a simplified XPath like expression).

I would say that XML::Reader fills an ecological niche that is
neither filled by XML::Twig, nor by XML::TokeParser.

Regards,
Klaus

Re: XML::Reader

Reply via email to