Can you suggest a fast, efficient way to use Perl to extract selected data from an XML file?
I am in the process of re-writing my Alex Catalogue of Electronic Texts. In this re-write I will be marking up items in the collection as TEI/XML files. These files will them become my archival copies of the data much like the TIFF files of image databases. I will repurpose the TEI files to create plain text files, HTML files, Palm documents, PDF files, as well as provide the means for full-text, fielded, and concordance indexing and searching. Much of this work is already done for a small subset of data, and you can see the work in progress here: http://infomotions.com/alex2/ To create my HTML files with rich meta data, I need to extract bits and pieces of information from the teiHeader of my originals. The snippet of code below illustrates how I am currently doing this with XML::LibXML: # require the necessary module use XML::LibXML; # initialize my $parser = XML::LibXML->new; my $file = '/foo/bar.xml'; # do the work my $doc = $parser->parse_file($file); my $root = $doc->getDocumentElement; my @header = $root->findnodes('teiHeader'); my $author = $header[0]->findvalue('fileDesc/titleStmt/author'); my $title = $header[0]->findvalue('fileDesc/titleStmt/title'); my $id = $header[0]->findvalue('fileDesc/publicationStmt/idno'); # output the results print " author: $author\n title: $title\n id: $id\n\n"; The code works, but is really slow. Can you suggest a way to improve my code or use some other technique for extracting things like author, title, and id from my XML? -- Eric Lease Morgan University Libraries of Notre Dame (574) 631-8604