A beta version of a Perl OAI-PMH harvesting library was just uploaded to CPAN as Net::OAI::Harvester. The idea behind Net::OAI::Harvester is to provide an object-oriented client interface to the data found in OAI-PMH repositories (similar to what LWP::UserAgent does for HTTP).
More about OAI-PMH can be found here: http://www.openarchives.org And more about Net::OAI::Harvester can be found here: http://search.cpan.org/author/ESUMMERS/OAI-Harvester-0.1/ All of the 6 OAI-PMH verbs are supported. As an example here is the code to retrieve a particular record from LC as Dublin Core and display the title. my $harvester = Net::OAI::Harvester->new( baseUrl => 'http://memory.loc.gov/cgi-bin/oai2_0' ); my $record = $harvester->getRecord( identifier => 'oai:lcoa1.loc.gov:loc.gmd/g3764s.pm003250', metadataPrefix => 'oai_dc' ); my $metadata = $record->metadata(); print "title: ", $metadata->title(), "\n"; Features: - OAI-PMH responses can often be rather large XML files. Net::OAI::Harvester uses stream based parsing (XML::SAX) and serializes data as Perl objects on disk (using YAML). This serialized data is then made available through an iterator interface which means that you keep a relatively low memory foot print when doing ListRecords or ListIdentifiers requests. - Net::OAI::Harvester includes Net::OAI::Record::OAI_DC which is an XML::SAX handler for parsing and providing an object oriented interface to baseline Dublin Core metadata. It also provides a framework for dropping in your own XML::SAX handler if you want to parse other types of metadata. The idea is that as people create their own handlers they can be easily included in the Net::OAI::Harvester distribution. - If you are interested in the XML itself you can easily get a hold of the temporary file that contains the full XML response, and do what you want with it. - You can easily can a hold of the error code and message associated with any request. Caveats: - Net::OAI::Harvester only supports OAI-PMH v.2. - No support for compression (yet). - Needs more documentation, and examples. - You need to handle resumptionTokens explicitly. This means a call to listRecords() will not go and grab everything, but just the first chunk. However, there is infrastrucutre and methods to easily get at and pass the tokens. Feedback/comments/testser would be appreciated. If you are at all interested in getting involved in the project please write to me directly, or (preferably) use [EMAIL PROTECTED] or [EMAIL PROTECTED] //Ed