Extract attribute from huge xml file

Beginner Mon, 10 Dec 2007 05:15:03 -0800

Hi,

I have a huge XML file, 1.7GB, 53080215 lines. I am trying to extract 
an attribute from each record (code=). I several problems one of 
which is the size of the file is making it painful to test my scripts 
and methods for parsing.


I would like to extract a few hundred records (by any means) so I can 
experiment.  I think XPath is the way to go here. The file 
(currently) sits on a *nix system but I was going to do the parsing 
to on a Win32 workstation rather than steal all the memory on a 
server.

Below is a sample of some data. I have XML::XPath installed, there 
doesn't seems to be a libXML2 for Win32 . This is my first effort but 
I haven't been able to run it fully as my workstation began to page 
severely after a while. So I would like so hints before try again as 
each attempt takes ages.

TIA,
Dp.


=======
#!/bin/perl

use strict;
use warnings;
use XML::XPath;
use XML::XPath::XMLParser;

my $xmp = XML::XPath->new(filename => 'myfile.xml');

my $nodeset = $xmp->find('/records/record/');

foreach my $node ($nodeset->get_nodelist) {  
        my $attrib = $node->getNodeType('ATTRIBUTE_NODE');
        print "$attrib\n";
}
=====

<?xml version = "1.0" encoding= "utf-8"?>
<records>
        <record code="65020/0002">
                    <display_number>65020/003</display_number>
                <title>Moulded resistors in synthetic resin</title>
                <created_date>05-Mar-85</created_date>
                <updated_date>15-Nov-07</updated_date>
                <restrictions>
                </restrictions>
        </image>
...snip

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Extract attribute from huge xml file

Reply via email to