If you are sure that the element <record code=xxxx> is always writtent on a separate line.
you should try
open (FILE, my_file);
while (<FILE>)
   if /<record code="([^"]*">/ {
      print "$1\n" ;
   }
close (FILE) ;

It is not XML like but it works very faster.

Beginner a e'crit :
Hi,

I have a huge XML file, 1.7GB, 53080215 lines. I am trying to extract an attribute from each record (code=). I several problems one of which is the size of the file is making it painful to test my scripts and methods for parsing.

I would like to extract a few hundred records (by any means) so I can experiment. I think XPath is the way to go here. The file (currently) sits on a *nix system but I was going to do the parsing to on a Win32 workstation rather than steal all the memory on a server.

Below is a sample of some data. I have XML::XPath installed, there doesn't seems to be a libXML2 for Win32 . This is my first effort but I haven't been able to run it fully as my workstation began to page severely after a while. So I would like so hints before try again as each attempt takes ages.

TIA,
Dp.


=======
#!/bin/perl

use strict;
use warnings;
use XML::XPath;
use XML::XPath::XMLParser;

my $xmp = XML::XPath->new(filename => 'myfile.xml');

my $nodeset = $xmp->find('/records/record/');

foreach my $node ($nodeset->get_nodelist) { my $attrib = $node->getNodeType('ATTRIBUTE_NODE');
        print "$attrib\n";
}
=====

<?xml version = "1.0" encoding= "utf-8"?>
<records>
        <record code="65020/0002">
                    <display_number>65020/003</display_number>
                <title>Moulded resistors in synthetic resin</title>
                <created_date>05-Mar-85</created_date>
                <updated_date>15-Nov-07</updated_date>
                <restrictions>
                </restrictions>
        </image>
...snip



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to