Re: Fast XML parser?

Octavian Rasnita Mon, 29 Oct 2012 05:33:28 -0700

From: "Shlomi Fish" <shlo...@shlomifish.org>
On Mon, 29 Oct 2012 10:09:53 +0200
Shlomi Fish <shlo...@shlomifish.org> wrote:


> Hi Octavian,
> 
> On Sun, 28 Oct 2012 17:45:15 +0200
> "Octavian Rasnita" <orasn...@gmail.com> wrote:
> 
> > From: "Shlomi Fish" <shlo...@shlomifish.org>
> > 
> > Hi Octavian,
> > 
> > 
> > 
> > Hi Shlomi,
> > 
> > I tried to use XML::LibXML::Reader which uses the pool parser, and I read 
> > that:
> > 
> > ""
> > However, it is also possible to mix Reader with DOM. At every point the
> > user may copy the current node (optionally expanded into a complete
> > sub-tree) from the processed document to another DOM tree, or to
> > instruct the Reader to collect sub-document in form of a DOM tree
> > ""
> > 
> > So I tried:
> > 
> > use XML::LibXML::Reader;
> > 
> > my $xml = 'path/to/xml/file.xml';
> > 
> > my $reader = XML::LibXML::Reader->new( location => $xml ) or die "cannot 
> > read $xml";
> > 
> > while ( $reader->nextElement( 'Lexem' ) ) {
> >     my $id = $reader->getAttribute( 'id' ); #works fine
> > 
> >     my $doc = $reader->document;
> > 
> >     my $timestamp = $doc->getElementsByTagName( 'Timestamp' ); #Doesn't
> > work well
> >     my @lexem_text = $doc->getElementsByTagName( 'Form' ); #Doesn't work 
> > fine
> > 
> > }
> > 
> 
> I'm not sure you should do ->document. I cannot tell you off-hand how to do it
> right, but I can try to investigate when I have some spare cycles.
> 

OK, after a short amount of investigation, I found that this program works:

[CODE]

use strict;
use warnings;

use XML::LibXML::Reader;

my $xml = 'Lexems.xml';

my $reader = XML::LibXML::Reader->new( location => $xml ) or die "cannot read
$xml";

while ( $reader->nextElement( 'Lexem' ) ) {
    my $id = $reader->getAttribute( 'id' ); #works fine

    my $doc = $reader->copyCurrentNode(1);
    my $timestamp = $doc->getElementsByTagName( 'Timestamp' );
    my @lexem_text = $doc->getElementsByTagName( 'Form' );
}

[/CODE]

Note that you can also use XPath for looking up XML information.

Regards,

Shlomi Fish


-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/



I followed the way you suggested, and it works fine, however it is very slow.

I've done:

while ( $reader->nextElement( 'Lexem' ) ) {
    my $id = $reader->getAttribute( 'id' );

       my $doc = $reader->copyCurrentNode(1);

    my $timestamp = $doc->findnodes( 'Timestamp' );
    my $lexem_text = $doc->findnodes( 'Form' );

    my $inflected_forms = $doc->findnodes( 'InflectedForm' );

    for my $inflected_form ( $inflected_forms->get_nodelist ) {
        my $inflection_id = $inflected_form->findnodes( './InflectionId' );
        my $inflection_dia = $inflected_form->findnodes( './Form' );
}
}

I tried to find a way of using XPath but I couldn't find a good one, and it 
seems that copy of that node takes a pretty long time.

Octavian


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Fast XML parser?

Reply via email to