On Dec 10, 2007 8:24 AM, Tim Bowden <[EMAIL PROTECTED]> wrote: > > On Mon, 2007-12-10 at 13:14 +0000, Beginner wrote: > > Hi, > > > > I have a huge XML file, 1.7GB, 53080215 lines. I am trying to extract > > an attribute from each record (code=). I several problems one of > > which is the size of the file is making it painful to test my scripts > > and methods for parsing. > > > > I would like to extract a few hundred records (by any means) so I can > > experiment. I think XPath is the way to go here. The file > > (currently) sits on a *nix system but I was going to do the parsing > > to on a Win32 workstation rather than steal all the memory on a > > server. > If your data file is on a *nix system, use > head -200 filename > sample_filename to take the first 200 records. snip
Unfortunately that won't work with structured data like XML. You best bet is to use something like XML::Twig to grab the top level records and output them to a new file. for instance, say we have an XML file that looks like this <root> <records set="1"> <record>foo</record> <record>bar</record> <record>baz</record> </records> <records set="2"> <record>quux</record> </records> <records set="3"> <record>foofoo</record> <record>foobar</record> </records> </root> and we only want the first two sets of records. We could use this code to produce a new file with only those records #!/usr/bin/perl use strict; use warnings; use XML::Twig; my $i; my $t = XML::Twig->new( twig_handlers => { records => sub { exit if ++$i > 2; $_->print; $_->flush; } } ); print "<root>"; $t->parsefile("t.xml"); print "</root>"; -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/