Re: RFC: OAI::Harvester
On Mon, 7 Jul 2003 12:25:46 -0400, Ed Summers wrote: > The OAI::Harvester disto is made up of 14 separate packages. For the sake > of preserving my existing namespaces I'd like to upload these into a > top-level namespace of OAI::, but Metadata:: could work as well. Looks like you've gone ahead with Net::OAI::, which seems reasonable to me. Of course, strictly speaking, it should be Net::OAI::PMH::, but whenever I've heard people talking about OAI, they always leave the PMH bit off, which suits me, given my initials :-) -- Peter Haworth [EMAIL PROTECTED] "ACRONYM = A Capitalized Representation Of Names You Memorize" -- Bartman
Net::OAI::Harvester
A beta version of a Perl OAI-PMH harvesting library was just uploaded to CPAN as Net::OAI::Harvester. The idea behind Net::OAI::Harvester is to provide an object-oriented client interface to the data found in OAI-PMH repositories (similar to what LWP::UserAgent does for HTTP). More about OAI-PMH can be found here: http://www.openarchives.org And more about Net::OAI::Harvester can be found here: http://search.cpan.org/author/ESUMMERS/OAI-Harvester-0.1/ All of the 6 OAI-PMH verbs are supported. As an example here is the code to retrieve a particular record from LC as Dublin Core and display the title. my $harvester = Net::OAI::Harvester->new( baseUrl => 'http://memory.loc.gov/cgi-bin/oai2_0' ); my $record = $harvester->getRecord( identifier => 'oai:lcoa1.loc.gov:loc.gmd/g3764s.pm003250', metadataPrefix => 'oai_dc' ); my $metadata = $record->metadata(); print "title: ", $metadata->title(), "\n"; Features: - OAI-PMH responses can often be rather large XML files. Net::OAI::Harvester uses stream based parsing (XML::SAX) and serializes data as Perl objects on disk (using YAML). This serialized data is then made available through an iterator interface which means that you keep a relatively low memory foot print when doing ListRecords or ListIdentifiers requests. - Net::OAI::Harvester includes Net::OAI::Record::OAI_DC which is an XML::SAX handler for parsing and providing an object oriented interface to baseline Dublin Core metadata. It also provides a framework for dropping in your own XML::SAX handler if you want to parse other types of metadata. The idea is that as people create their own handlers they can be easily included in the Net::OAI::Harvester distribution. - If you are interested in the XML itself you can easily get a hold of the temporary file that contains the full XML response, and do what you want with it. - You can easily can a hold of the error code and message associated with any request. Caveats: - Net::OAI::Harvester only supports OAI-PMH v.2. - No support for compression (yet). - Needs more documentation, and examples. - You need to handle resumptionTokens explicitly. This means a call to listRecords() will not go and grab everything, but just the first chunk. However, there is infrastrucutre and methods to easily get at and pass the tokens. Feedback/comments/testser would be appreciated. If you are at all interested in getting involved in the project please write to me directly, or (preferably) use [EMAIL PROTECTED] or [EMAIL PROTECTED] //Ed
MARC::Record and the subfields in the 650
Hi all: I have a question about extracting the subfields from the 650 in the proper order. Basically, I have a number of records that contain 650s with $x Periodicals. I need to modify the subfield from x to v. The problem I am running into is that I don't know how to keep the subfields in the proper order. I have some records that have 650 $a $x $y $x and others that have 650 $a $x $x. It is this last $x that I need to change to a $v. Can someone offer some advise on modifying subfields that need to stay in the order found in the record? Thanks. Michael Michael L. Bowden Coordinator of Automation and Access Services, Assistant Professor of Information Science Harrisburg Area Community College [EMAIL PROTECTED] 717.780.1936