Re: Extract attribute from huge xml file

patmarbidon Mon, 10 Dec 2007 23:22:43 -0800

If you are sure that the element <record code=xxxx> is always writtenton a separate line.

you should try
open (FILE, my_file);
while (<FILE>)
   if /<record code="([^"]*">/ {
      print "$1\n" ;
   }
close (FILE) ;


It is not XML like but it works very faster.

Beginner a e'crit :

Hi,
I have a huge XML file, 1.7GB, 53080215 lines. I am trying to extractan attribute from each record (code=). I several problems one ofwhich is the size of the file is making it painful to test my scriptsand methods for parsing.
I would like to extract a few hundred records (by any means) so I canexperiment. I think XPath is the way to go here. The file(currently) sits on a *nix system but I was going to do the parsingto on a Win32 workstation rather than steal all the memory on aserver.
Below is a sample of some data. I have XML::XPath installed, theredoesn't seems to be a libXML2 for Win32 . This is my first effort butI haven't been able to run it fully as my workstation began to pageseverely after a while. So I would like so hints before try again aseach attempt takes ages.
TIA,
Dp.


=======
#!/bin/perl

use strict;
use warnings;
use XML::XPath;
use XML::XPath::XMLParser;

my $xmp = XML::XPath->new(filename => 'myfile.xml');

my $nodeset = $xmp->find('/records/record/');
foreach my $node ($nodeset->get_nodelist) {my $attrib = $node->getNodeType('ATTRIBUTE_NODE');
        print "$attrib\n";
}
=====

<?xml version = "1.0" encoding= "utf-8"?>
<records>
        <record code="65020/0002">
                    <display_number>65020/003</display_number>
                <title>Moulded resistors in synthetic resin</title>
                <created_date>05-Mar-85</created_date>
                <updated_date>15-Nov-07</updated_date>
                <restrictions>
                </restrictions>
        </image>
...snip



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Extract attribute from huge xml file

Reply via email to