From: "Jenda Krynicky" <je...@krynicky.cz> > From: "Octavian Rasnita" <orasn...@gmail.com> > To: <beginners@perl.org> > Subject: Fast XML parser? > Date sent: Thu, 25 Oct 2012 14:33:15 +0300 > >> Hi, >> >> Can you recommend an XML parser which is faster than XML::Twig? >> >> I need to use an XML parser that can parse the XML files chunk by chunk and >> which works faster (much faster) than XML::Twig, because I tried using this >> module but it is very slow. >> >> I tried something like the code below, but I have also tried a version >> that just opens the file and parses it using regular expressions, >> however the unelegant regexp version is 25 times faster than the one >> which uses XML::Twig, and it also uses less memory. >> >> If you think there is a module for parsing XML which would work faster >> than regular expressions, or if I can substantially improve the >> program which uses XML::Twig then please tell me about it. If regexp >> will still be faster, I will use regexp. > > You did not specify what do you want to do with the lexemes anyway > you might try something like this: > > use strict; > use XML::Rules; > use Data::Dumper; > > my $parser = XML::Rules->new( > stripspaces => 7, > rules => { > _default => 'content', > InflectedForm => 'as array', > Lexem => sub { > #print Dumper($_[1]); > print "$_[1]->{Form}\n"; > foreach (@{$_[1]->{InflectedForm}}) { > print " $_->{InflectionId}: $_->{Form}\n"; > } > }, > } > ); > > $parser->parse(\*DATA); > > __DATA__ > <?xml version="1.0" encoding="UTF-8"?> > <Lexems> > <Lexem id="1"> > ... > > XML::Rules sits on top of XML::Parser::Expat so I would not expect > this to be 25 times faster than XML::Twig, but it might be a bit > quicker. Or not. > > Jenda
Hi Jenda, I tried your program above, modified as below, but it gives the error: Free to wrong pool 3967d8 not 20202020 at e:/usr/lib/XML/Parser/Expat.pm line 470. I was able to install XML::Rules under Windows using cpanm with no problems, so it should be working... The program: use strict; use XML::Rules; use Data::Dumper; my $parser = XML::Rules->new( stripspaces => 7, rules => { _default => 'content', InflectedForm => 'as array', Lexem => sub { #print Dumper($_[1]); #print "$_[1]->{Form}\n"; foreach (@{$_[1]->{InflectedForm}}) { #print " $_->{InflectionId}: $_->{Form}\n"; } }, } ); my $file = '/path/to/file.xml'; open my $xml, '<:utf8', $file or die "Cannot open $file: $!"; $parser->parse( $xml ); Thanks. Octavian -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/