Mike Blezien wrote:
we need to parse some very large XML files, approx., 900-1000KB's
filesize. A sample of a typical XML file can be view here that would be
parsed: http://projects.thunder-rain.com/uploads/000001.xml
I was planning on using the XML::Twig module to do this, using the
following code snip to loop through each of the <product> ....
</product> elements. Not every single element is needed but most within
each loop of each <product></product>
# Code snip:
####################################################################
my $xmlfile = '/path/to/upload/000001.xml';
my $cgi = new CGI();
my $twig = new XML::Twig(twig_handlers => {
product => \&get_products,
});
$twig->parsefile("$xmlfile");
sub get_products {
my($t,$elt) = @_;
# loop through each product.
my $article_number = $elt->first_child_text('article_number');
my $ean_upc = $elt->first_child_text('ean_upc');
my $distributor_number = $elt->first_child_text('distributor_number');
my $distributor_name = $elt->first_child_text('distributor_name');
my $artist = $elt->first_child_text('artist');
# now loop through each
<tracks><number_of_tracks></number_of_tracks><playtime></playtime>
# <track> <sound> </sound> </track></tracks> for each product.
# <number_of_tracks> element determines total <tracks> .. <track>
<sound> </sound> </track> .. </tracks>
# # in loop.
$t->purge();
}
exit();
#################################################################
Now the areas I'm have alot of problem is with the elements within each
product, the
<tracks> .... </tracks> and looping through each of the tracks child
elements and <sound></sound>
---------
<product>
.......
<tracks>
<number_of_tracks></number_of_tracks><playtime></playtime>
<track> ....
<sound> ..
</sound>
</track>
</tracks>
........
</product>
--------
Is there a better way to do this to obtain all the data within each of
the <product> ... </product> elements? I've never really worked with XML
files this large and complex tree. Any help or suggestions would be much
appreciated.
Hi Mike
Your application of XML::Twig seems exactly right. I'm not sure what it is you
don't understand, but if you use this as your 'get_products' subroutine I hope
it answers some questions. All it does is print the title of the product and
the title of all the tracks in that product. Post again if you have any trouble
understanding what I've written.
sub get_products {
my $product = $_;
my $product_title = $product->first_child('title');
print $product_title->trimmed_text, "\n";
my $tracks = $product->first_child('tracks');
return unless $tracks;
foreach my $track ($tracks->children('track')) {
my $track_title = $track->first_child('title');
print ' ', $track_title->trimmed_text, "\n";
}
print "\n";
}
HTH,
Rob
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/