Re: Need to process a XML file through Perl.

Philip Potter Wed, 02 Dec 2009 01:27:05 -0800

2009/12/2 Parag Kalra <paragka...@gmail.com>:
> Currently I am planning to process the above requirement using simple Perl
> regex. But I feel it can be made simpler using any of the available modules.
>
> So I have following questions:
> 1.       Which are the best available XML modules for Perl?
> 2.       Out of the best available modules, which one would suite my
> requirement in the best way?
> 3.       Any pointers to specific methods of the XML modules to suffice my
> needs would be helpful.
> TIA


In general, regexes are to be avoided for context-free or more complex
languages, in which category XML, HTML and SGML fall.

perlbot on #perl on freenode gives this stock response: "Don't parse
XML with regex! Use a real parser. Avoid XML::Simple (see the
xml::simple factoid). Choices are ::Easy, ::Smart, ::Twig for simple
stuff. LibXML for big stuff. See also XML::All.
http://perl-xml.sf.net/faq/";

That's a good list of XML modules and a good page for much more
comprehensive information.

I would add, however, that it really depends what you intend to do.
Regexes are useful in a very limited set of conditions:
* If the XML is simple, regular, and predictable, and it has no danger
of getting more complex as time progresses
* If you only want to extract or modify one highly localised feature
from the file, and if what you want to access will not change or grow
more complex

And of course if these conditions are satisfied then the question "Why
are you using XML?" is raised. A valid answer might be "The
information I need is in XML produced by someone else". In Hack #23 of
Spidering Hacks, David Landgren gives a case study where he had some
printers which reported their status (ink and paper remaining) in an
XML file served from a web server. It was such a small and
simply-structured amount of data, being generated by a machine which
was predictable and which wouldn't change the structure, that regexes
were the simplest way to extract the information from them.

However, there are many counterindications which show that regexes
will become more complex and harder to maintain than the equivalent
parser-based implementation. While regexes are (IMHO) acceptable for
extracting information from fixed- or restricted-structure, regular,
machine generated XML, they will be the Wrong Thing entirely if the
XML has arbitrary text in attributes <tag attrib="></tag>">, arbitrary
tag nesting structure, or if the XML format is evolving or extensible,
or if you may want to retarget your parser at a different XML format
in the future. In general, if unsure, use a real parser.

Phil

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Need to process a XML file through Perl.

Reply via email to