On Tue, May 11, 2010 at 7:49 AM, Jonathan Rockway <j...@jrock.us> wrote:
> * On Tue, Apr 27 2010, Klaus wrote: > > I have released XML::Reader (ver 0.34) > > http://search.cpan.org/~keichner/XML-Reader-0.34/lib/XML/Reader.pm<http://search.cpan.org/%7Ekeichner/XML-Reader-0.34/lib/XML/Reader.pm> > by the way, I have now released a new version of XML::Reader (ver 0.35) with some bug fixes, warts removed, relicensing, etc... http://search.cpan.org/~keichner/XML-Reader-0.35/lib/XML/Reader.pm<http://www.google.com/url?sa=D&q=http://search.cpan.org/%7Ekeichner/XML-Reader-0.35/lib/XML/Reader.pm&usg=AFQjCNFMDvw04s1jwrzMvJCddJWgkjfcJg> > > To explain the module, I have created a small demonstration program > > that extracts XML-subtrees (for example any path that ends with '/.../ > > a') memory efficiently. > > > > An XML document can be very large (possibly many gigabytes), but is > > composed of XML-subtrees, each of which is only a few kilobytes in > > size. The demonstration program reads XML-subtrees one by one, only > > the memory for one subtree is held at a time. Each subtree can then be > > processed further at your convenience (for example by using regular > > expressions, or, by using other XML-Modules, such as XML::Simple). In > > principle, XML::Reader has no event driven callback functions, you > > have to loop over the XML-document yourself and the resulting XML- > > subtree is represented in text format. > > So apparently I am rather behind on module-authors, but I just thought > I'd ask if you've taken a look at XML::Twig? That seems to be the main > module for this sort of thing, and seems to have an established > userbase. Maybe patches to that would be more productive than > reinventing the wheel? Thanks for your message. I would position XML::Reader in the same space as XML::Twig and XML::TokeParser. I have taken a look at XML::Twig which has an established userbase and I agree in that XML::Reader duplicates many of the functionalities already provided by XML::Twig. However, unlike XML::Twig, XML::Reader does not rely on callback functions to parse the XML. With XML::Reader you loop over the XML-document yourself and the resulting XML-elements (and/or XML-subtrees) are represented in text format. This style of processing XML is similar to the classic pattern: "open my $fh, '<', 'file.txt'; while (<$fh>) { do_sth($_); } close $fh;" This pattern is also implemented by XML::TokeParser. However, unlike XML::TokeParser, XML::Reader records the full XML path as it processes the XML-document, therefore it can target not only specific tags, but it can also target a full path of nested element tags (a simplified XPath like expression). I would say that XML::Reader fills an ecological niche that is neither filled by XML::Twig, nor by XML::TokeParser. Regards, Klaus