Fourth, I tried both of these approaches plus my own, and timed them. I had
to process 1.5 MB of data in nineteen files. Tiny. Ironically, my original
code was the fastest at 96 seconds.
Yikes!
The XSLT implementation came in second at 101 seconds,
Yikes again.
and the XML::Twig implementation, while straight-forward came in last as 141 seconds. (See the attached code snippets.)
Did you try using 'twig_roots' instead of 'TwigHandlers' in the constructor? Also, it might speed up if you purge the twig at the end of each handler; this is supposed to release memory.
# using XML::Twig print "Processing $file...\n"; my ($author, $title, $id); my $author_xpath = 'teiHeader/fileDesc/titleStmt/author'; my $title_xpath = 'teiHeader/fileDesc/titleStmt/title'; my $id_xpath = 'teiHeader/fileDesc/publicationStmt/idno'; my $twig = new XML::Twig('twig_roots' => { $author_xpath => sub {$author = $_[1]->text; $twig->purge }, $title_xpath => sub {$title = $_[1]->text; $twig->purge }, $id_xpath => sub {$id = $_[1]->text; $twig->purge }}); $twig->parsefile($file); $twig->purge; print " author: $author\n title: $title\n id: $id\n\n";
Have you considered using a regular expression to extract the teiHeader?
Maybe the folks at PerlMonks would have some helpful suggestions.
Paul.
-- Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan [EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/