From: "Beginner" <[EMAIL PROTECTED]> > I have a huge XML file, 1.7GB, 53080215 lines. I am trying to extract > an attribute from each record (code=). I several problems one of > which is the size of the file is making it painful to test my scripts > and methods for parsing. > > I would like to extract a few hundred records (by any means) so I can > experiment. I think XPath is the way to go here. The file > (currently) sits on a *nix system but I was going to do the parsing > to on a Win32 workstation rather than steal all the memory on a > server.
If all you want (for now) is to get a smaller file for testing (and the file structure is not complex (i.e. the tag immediately bellow the root is repeated many times and there is no "footer" that you'd need to keep), you should take the first N megabytes, search for the last closing tag, throw away the stuff after that tag and append the closing root tag. To process the real file then you'll definitely need something that doesn't attempt to parse the whole file and create a huge datastructure or maze of objects. You need something stream or chunk oriented. XML::Twig, XML::Rules (I promise I'll release a version that doesn't waste memory on whitespace around the tags if you don't need it within a week (http://www.perlmonks.org/?node_id=654367). The changes are done and seem to be working ...), some SAX oriented module, XML::Parser, ... And you should definitely do that processing on the same box that stores the file ... you do not want to keep wasting network bandwidth with 1.7GB if all you need is a few pieces of info. Jenda ===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz ===== When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/