On Dec 10, 2007 8:24 AM, Tim Bowden <[EMAIL PROTECTED]> wrote:
>
> On Mon, 2007-12-10 at 13:14 +0000, Beginner wrote:
> > Hi,
> >
> > I have a huge XML file, 1.7GB, 53080215 lines. I am trying to extract
> > an attribute from each record (code=). I several problems one of
> > which is the size of the file is making it painful to test my scripts
> > and methods for parsing.
> >
> > I would like to extract a few hundred records (by any means) so I can
> > experiment.  I think XPath is the way to go here. The file
> > (currently) sits on a *nix system but I was going to do the parsing
> > to on a Win32 workstation rather than steal all the memory on a
> > server.
> If your data file is on a *nix system, use
> head -200 filename > sample_filename to take the first 200 records.
snip

Unfortunately that won't work with structured data like XML.  You best
bet is to use something like XML::Twig to grab the top level records
and output them to a new file.  for instance, say we have an XML file
that looks like this

<root>
        <records set="1">
                <record>foo</record>
                <record>bar</record>
                <record>baz</record>
        </records>
        <records set="2">
                <record>quux</record>
        </records>
        <records set="3">
                <record>foofoo</record>
                <record>foobar</record>
        </records>
</root>

and we only want the first two sets of records.  We could use this
code to produce a new file with only those records

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

my $i;
my $t = XML::Twig->new(
        twig_handlers => {
                records => sub {
                        exit if ++$i > 2;
                        $_->print;
                        $_->flush;
                }
        }
);

print "<root>";
$t->parsefile("t.xml");
print "</root>";

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to