Re: Extracting data from an XML file

Paul Hoffman Tue, 06 Jan 2004 07:46:17 -0800

On Monday, January 5, 2004, at 10:27 PM, Eric Lease Morgan wrote:

Fourth, I tried both of these approaches plus my own, and timed them. I had to process 1.5 MB of data in nineteen files. Tiny. Ironically, my original code was the fastest at 96 seconds.

Yikes!

The XSLT implementation came in second
at 101 seconds,

Yikes again.

and the XML::Twig implementation, while straight-forward
came in last as 141 seconds. (See the attached code snippets.)

Did you try using 'twig_roots' instead of 'TwigHandlers' in the constructor? Also, it might speed up if you purge the twig at the end of each handler; this is supposed to release memory.

  # using XML::Twig
  print "Processing $file...\n";
  my ($author, $title, $id);
  my $author_xpath = 'teiHeader/fileDesc/titleStmt/author';
  my $title_xpath = 'teiHeader/fileDesc/titleStmt/title';
  my $id_xpath = 'teiHeader/fileDesc/publicationStmt/idno';
  my $twig = new XML::Twig('twig_roots' => {
        $author_xpath => sub {$author = $_[1]->text; $twig->purge },
        $title_xpath  => sub {$title  = $_[1]->text; $twig->purge },
        $id_xpath     => sub {$id     = $_[1]->text; $twig->purge }});
  $twig->parsefile($file);
  $twig->purge;
  print "  author: $author\n   title: $title\n      id: $id\n\n";

Have you considered using a regular expression to extract the teiHeader?

Maybe the folks at PerlMonks would have some helpful suggestions.

Paul.

--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/

Re: Extracting data from an XML file

Reply via email to