If you are sure that the element <record code=xxxx> is always writtent
on a separate line.
you should try
open (FILE, my_file);
while (<FILE>)
if /<record code="([^"]*">/ {
print "$1\n" ;
}
close (FILE) ;
It is not XML like but it works very faster.
Beginner a e'crit :
Hi,
I have a huge XML file, 1.7GB, 53080215 lines. I am trying to extract
an attribute from each record (code=). I several problems one of
which is the size of the file is making it painful to test my scripts
and methods for parsing.
I would like to extract a few hundred records (by any means) so I can
experiment. I think XPath is the way to go here. The file
(currently) sits on a *nix system but I was going to do the parsing
to on a Win32 workstation rather than steal all the memory on a
server.
Below is a sample of some data. I have XML::XPath installed, there
doesn't seems to be a libXML2 for Win32 . This is my first effort but
I haven't been able to run it fully as my workstation began to page
severely after a while. So I would like so hints before try again as
each attempt takes ages.
TIA,
Dp.
=======
#!/bin/perl
use strict;
use warnings;
use XML::XPath;
use XML::XPath::XMLParser;
my $xmp = XML::XPath->new(filename => 'myfile.xml');
my $nodeset = $xmp->find('/records/record/');
foreach my $node ($nodeset->get_nodelist) {
my $attrib = $node->getNodeType('ATTRIBUTE_NODE');
print "$attrib\n";
}
=====
<?xml version = "1.0" encoding= "utf-8"?>
<records>
<record code="65020/0002">
<display_number>65020/003</display_number>
<title>Moulded resistors in synthetic resin</title>
<created_date>05-Mar-85</created_date>
<updated_date>15-Nov-07</updated_date>
<restrictions>
</restrictions>
</image>
...snip
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/