From:                   "Octavian Rasnita" <orasn...@gmail.com>
To:                     <beginners@perl.org>
Subject:                Fast XML parser?
Date sent:              Thu, 25 Oct 2012 14:33:15 +0300

> Hi,
> 
> Can you recommend an XML parser which is faster than XML::Twig?
> 
> I need to use an XML parser that can parse the XML files chunk by chunk and 
> which works faster (much faster) than XML::Twig, because I tried using this 
> module but it is very slow.
> 
> I tried something like the code below, but I have also tried a version
> that just opens the file and parses it using regular expressions,
> however the unelegant regexp version is 25 times faster than the one
> which uses XML::Twig, and it also uses less memory. 
> 
> If you think there is a module for parsing XML which would work faster
> than regular expressions, or if I can substantially improve the
> program which uses XML::Twig  then please tell me about it. If regexp
> will still be faster, I will use regexp. 

You did not specify what do you want to do with the lexemes anyway 
you might try something like this:

use strict;
use XML::Rules;
use Data::Dumper;

my $parser = XML::Rules->new(
        stripspaces => 7,
        rules => {
                _default => 'content',
                InflectedForm => 'as array',
                Lexem => sub {
#print Dumper($_[1]);
                        print "$_[1]->{Form}\n";
                        foreach (@{$_[1]->{InflectedForm}}) {
                                print "  $_->{InflectionId}: $_->{Form}\n";
                        }
                },
        }
);

$parser->parse(\*DATA);

__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<Lexems>
  <Lexem id="1">
...

XML::Rules sits on top of XML::Parser::Expat so I would not expect 
this to be 25 times faster than XML::Twig, but it might be a bit 
quicker. Or not.

Jenda
===== je...@krynicky.cz === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to