Re: RFC: OAI::Harvester

2003-07-08 Thread Peter Haworth
On Mon, 7 Jul 2003 12:25:46 -0400, Ed Summers wrote:
> The OAI::Harvester disto is made up of 14 separate packages. For the sake
> of preserving my existing namespaces I'd like to upload these into a
> top-level namespace of OAI::, but Metadata:: could work as well.

Looks like you've gone ahead with Net::OAI::, which seems reasonable to me.

Of course, strictly speaking, it should be Net::OAI::PMH::, but whenever
I've heard people talking about OAI, they always leave the PMH bit off,
which suits me, given my initials :-)

-- 
Peter Haworth   [EMAIL PROTECTED]
"ACRONYM = A Capitalized Representation Of Names You Memorize" 
-- Bartman


Net::OAI::Harvester

2003-07-08 Thread Ed Summers
A beta version of a Perl OAI-PMH harvesting library was just uploaded to
CPAN as Net::OAI::Harvester. The idea behind Net::OAI::Harvester is to
provide an object-oriented client interface to the data found in OAI-PMH 
repositories (similar to what LWP::UserAgent does for HTTP). 

More about OAI-PMH can be found here:
 http://www.openarchives.org

And more about Net::OAI::Harvester can be found here:
 http://search.cpan.org/author/ESUMMERS/OAI-Harvester-0.1/

All of the 6 OAI-PMH verbs are supported. As an example here is the code to 
retrieve a particular record from LC as Dublin Core and display the title.

 my $harvester = Net::OAI::Harvester->new(
  baseUrl => 'http://memory.loc.gov/cgi-bin/oai2_0'
 );

 my $record = $harvester->getRecord(
  identifier => 'oai:lcoa1.loc.gov:loc.gmd/g3764s.pm003250',
  metadataPrefix => 'oai_dc'
 );

 my $metadata = $record->metadata();
 print "title: ", $metadata->title(), "\n";

Features:

- OAI-PMH responses can often be rather large XML files. Net::OAI::Harvester 
  uses stream based parsing (XML::SAX) and serializes data as Perl objects on 
  disk (using YAML). This serialized data is then made available through
  an iterator interface which means that you keep a relatively low
  memory foot print when doing ListRecords or ListIdentifiers requests.

- Net::OAI::Harvester includes Net::OAI::Record::OAI_DC which is an
  XML::SAX handler for parsing and providing an object oriented
  interface to baseline Dublin Core metadata. It also provides a
  framework for dropping in your own XML::SAX handler if you want to
  parse other types of metadata. The idea is that as people create their
  own handlers they can be easily included in the Net::OAI::Harvester
  distribution.

- If you are interested in the XML itself you can easily get a hold of the 
  temporary file that contains the full XML response, and do what you want 
  with it.

- You can easily can a hold of the error code and message associated with any 
  request.

Caveats:

- Net::OAI::Harvester only supports OAI-PMH v.2.

- No support for compression (yet).

- Needs more documentation, and examples.

- You need to handle resumptionTokens explicitly. This means a call to 
  listRecords() will not go and grab everything, but just the first chunk. 
  However, there is infrastrucutre and methods to easily get at and pass the 
  tokens.

Feedback/comments/testser would be appreciated. If you are at all interested in 
getting involved in the project please write to me directly, or (preferably) 
use [EMAIL PROTECTED] or [EMAIL PROTECTED]

//Ed


MARC::Record and the subfields in the 650

2003-07-08 Thread Michael Bowden
Hi all:
 
I have a question about extracting the subfields from the 650 in the
proper order.  Basically, I have a number of records that contain 650s
with $x Periodicals.  I need to modify the subfield from x to v.  The
problem I am running into is that I don't know how to keep the subfields
in the proper order.  I have some records that have 650 $a $x $y $x and
others that have 650 $a $x $x.  It is this last $x that I need to change
to a $v.  Can someone offer some advise on modifying subfields that need
to stay in the order found in the record?
 
Thanks.
 
Michael

 
 
Michael L. Bowden
Coordinator of Automation and Access Services,
Assistant Professor of Information Science
Harrisburg Area Community College
[EMAIL PROTECTED]
717.780.1936