Warnings during decode() of raw MARC

2004-08-18 Thread Bryan Baldus
I'm probably missing something obvious, but I have been unsuccessful in
trying to capture the warnings reported by  MARC::Record that are set by
MARC::File::USMARC->decode(). Is there a simple way to store the warnings
reported during the decode() process (using a MARC::Record or MARC::Batch
object)? 

When MARC::Record/Batch call next() on a record with problems reported
during decoding, the warning (for example  warnings about indicators being
forced to blanks, or about entirely empty subfields) prints to the screen
(STDERR?), but I  can't seem to directly capture those warnings to an array
(for later reporting). I have tried using:

#my $batch = MARC::Batch->new('USMARC', "$inputfile");
my @basewarnings = $batch->warnings();
or 
#$record = $batch->next()
my @basewarnings = $record->warnings();

push @haswarnings, @basewarnings;

Both seem to fail to capture the warnings reported by MARC::File::USMARC.

As a workaround, I wrote rawanddecodedscan.pl [1], which works well enough
as a stand-alone program. However, now  I would like to integrate those
warnings into the warnings reported by my lintallchecks.pl (and related
MARC::Lint-based)  programs [2].

In the Raw and Decoded Scan program, I got around the problem by opening the
file of MARC records twice and then  going through each record
simulaneously, decoding one manually and the other using MARC::Record.

Thank you for your assistance,

Bryan Baldus
http://home.inwave.com/eija/
[EMAIL PROTECTED]

##

[1] Raw and Decoded Scan script:
http://home.inwave.com/eija/fullrecscripts/rawanddecodedscan.txt

[2] Lint All Checks script:
http://home.inwave.com/eija/fullrecscripts/lintallchecks.txt


RE: Warnings during decode() of raw MARC

2004-08-18 Thread Houghton,Andrew
> From: Bryan Baldus [mailto:[EMAIL PROTECTED] 
> Sent: 18 August, 2004 09:24
> Subject: Warnings during decode() of raw MARC
> 
> I'm probably missing something obvious, but I have been 
> unsuccessful in trying to capture the warnings reported by  
> MARC::Record that are set by MARC::File::USMARC->decode(). Is 
> there a simple way to store the warnings reported during the 
> decode() process (using a MARC::Record or MARC::Batch object)? 
> 

How about this technique:

#!perl

package main;

sub out {

  print STDERR "ERR: Error 1\n";
  print STDERR "ERR: Error 2\n";
  print STDERR "ERR: Error 3\n";
  return;
}

sub main {

  my @errs = ();

  open SAVERR, ">&STDERR";
  open STDERR, ">errors.txt";
  open GETERR, ") {
push(@errs,$_);
  }

  close(GETERR);

  open STDERR, ">&SAVERR";
  
  print STDOUT scalar(@errs),"\n";
  return 0;
}

exit main::main;



Net::Z3950, OPAC record syntax & multiple MFHD 866

2004-08-18 Thread Michael D Doran
Please excuse the cross-posting (perl4lib & Net-z3950).

I am working with a Perl script designed to query our catalog via Net::Z3950
and retrieve a journal record.  The OPAC record syntax is specified because
the ultimate point of the script [1] is to parse the journal holdings to
determine if a particular year is owned by our library.  Our holdings (MFHD)
records often contain multiple 866 fields (which contain the actual holdings
info); however, Net::Z3950 only returns the *last* 866 from a MFHD record,
thereby giving an incomplete list of holdings.  

Below is the relevant code:
 
  use Net::Z3950;
  $issn = '0028-0836'
  $query = '@attr 1=8 ' . $issn;
  $target = 'pulse.uta.edu';
  $port = 7099;
  $database = 'pulse';
  $recordSyntax = 'OPAC';
  $conn = new Net::Z3950::Connection($target, $port, databaseName =>
$database);
  $rs = $conn->search(-prefix => $query);
  $rs->option(preferredRecordSyntax => $recordSyntax);
  for ( $i = 1; $i <= $rs->size(); $i++ ) {
$rec = $rs->record($i);
$marc = $rec->render();
print "$marc";
  }

If I search for the journal Nature (ISSN 0028-0836) which in our catalog has
these multiple 866s in the first holdings record:

  866  0 _av.253(1975)-v.344(1990:Apr.),
  866  0 _av.345(1990)-v.426(2003:Nov.20),
  866  0 _av.426(2003:Dec.)-v.429(2004:May)
  866  0 _aINDEXES
v.277(1979)-v.348(1990),v.403-408(2000),v.415(2002)-v.426(2003)

...I get this MARC data returned by Net::Z3950.  Note the "enumAndChron"
line which contains the 866 info.

* Bibliographic record:

245  00  $aNature.
260  $a[London, etc.,$bMacmillan Journals ltd.]

* Holdings record 1 of 4:
typeOfRecord: y
encodingLevel: 4
receiptAcqStatus: 4
generalRetention: 8
completeness: 4
dateOfReport: 00
nucCode: sel,per
localLocation: Science & Engineering Library: Periodicals
callNumber: Q 1
enumAndChron: ^_aINDEXES
v.277(1979)-v.348(1990),v.403-408(2000),v.415(2002)-v.426(2003)


As you can see, Net::Z3950 only returns the last 866 field.

So my questions are:
1) Has anyone else noticed/experienced this behavior (i.e. only getting the
last 866)?  I'm trying to determine if this behavior is unique to how I am
implementing/configuring Net::Z3950 and/or if it is ILMS specific.  This is
my first time using Net::Z3950, so if I'm doing something wrong, please
correct me.

2) Is this behavior by design or is it a bug?  According to the MARC
standard, the MFHD 866 is repeatable [2].  Please disregard the fact that we
have Index holdings in the 866 rather than the 868 ...or why we are using
multiple 866 even for regular holdings.  Those issues are not under my
control.

3) If it is a bug, is it in Net::Z3950 or is it in the Z39.50 protocol or in
the Voyager Z39.50 implementation/API.  (I have limited experience with
Z39.50 and the only other client I have, BookWhere, does not appear to offer
the "OPAC" record syntax.)  If it is in the Net::Z3950 module can it be
fixed?  :-)

I have browsed the Net-z3950 listserv archive back to September 2003 (when
version 0.36, which added support for the OPAC record syntax, was released)
and didn't see any mention of this behavior.

Our software and versions:
  Net::Z3950 version 0.39 (on Solaris)
  Our ILMS is Endeavor's Voyager, version 2001.2

Thanks!

-- Michael

[1] The script is designed as an SFX plug-in and was written by David Walker
of Cal State San Marcos
http://library.csusm.edu/csu/sfx/local_holding_chameleon.asp
[2] MARC 21 Concise Holdings: Textual Holdings Statement Fields (866-868)
http://www.loc.gov/marc/holdings/echdtext.html

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 cell
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/


Re: Warnings during decode() of raw MARC

2004-08-18 Thread Ed Summers
On Wed, Aug 18, 2004 at 08:23:59AM -0500, Bryan Baldus wrote:
> Both seem to fail to capture the warnings reported by MARC::File::USMARC.

There appears to be a bug in MARC::Batch::next() code at line 123 which
extracts the warnings from the newly instantiated MARC::Record object
and stuffs them into the MARC::Batch object so that they are available
at that level.

my @warnings = $rec->warnings();

The bug is that calling warnings() clears the warnings storage in the
MARC::Record object as a side effect. MARC::Batch should probably side
step calling warnings() and dig into the object directly...or there
should be another method that doesn't zap the storage.

Let me see if I can duplicate this problem in a test, and then see if
the fix actually works. If you can provide a .t file for the
MARC::Record distribution that would be handy too :-)

//Ed


RE: Net::Z3950, OPAC record syntax & multiple MFHD 866 - SOLVED

2004-08-18 Thread Michael D Doran
I have been informed that this is a Voyager ILMS Z39.50 server bug.  (Thanks
Sandy!)

Sorry for the false alarm... didn't mean to cast any aspersions on
Net::Z3950!

-- Michael

> -Original Message-
> From: Michael D Doran 
> Sent: Wednesday, August 18, 2004 11:23 AM
> To: '[EMAIL PROTECTED]'; [EMAIL PROTECTED]
> Subject: Net::Z3950, OPAC record syntax & multiple MFHD 866
> 
> Please excuse the cross-posting (perl4lib & Net-z3950).
> 
> I am working with a Perl script designed to query our catalog 
> via Net::Z3950 and retrieve a journal record.  The OPAC 
> record syntax is specified because the ultimate point of the 
> script [1] is to parse the journal holdings to determine if a 
> particular year is owned by our library.  Our holdings (MFHD) 
> records often contain multiple 866 fields (which contain the 
> actual holdings info); however, Net::Z3950 only returns the 
> *last* 866 from a MFHD record, thereby giving an incomplete 
> list of holdings.  
> 
> Below is the relevant code:
>  
>   use Net::Z3950;
>   $issn = '0028-0836'
>   $query = '@attr 1=8 ' . $issn;
>   $target = 'pulse.uta.edu';
>   $port = 7099;
>   $database = 'pulse';
>   $recordSyntax = 'OPAC';
>   $conn = new Net::Z3950::Connection($target, $port, 
> databaseName => $database);
>   $rs = $conn->search(-prefix => $query);
>   $rs->option(preferredRecordSyntax => $recordSyntax);
>   for ( $i = 1; $i <= $rs->size(); $i++ ) {
> $rec = $rs->record($i);
> $marc = $rec->render();
> print "$marc";
>   }
> 
> If I search for the journal Nature (ISSN 0028-0836) which in 
> our catalog has these multiple 866s in the first holdings record:
> 
>   866  0 _av.253(1975)-v.344(1990:Apr.),
>   866  0 _av.345(1990)-v.426(2003:Nov.20),
>   866  0 _av.426(2003:Dec.)-v.429(2004:May)
>   866  0 _aINDEXES 
> v.277(1979)-v.348(1990),v.403-408(2000),v.415(2002)-v.426(2003)
> 
> ...I get this MARC data returned by Net::Z3950.  Note the 
> "enumAndChron" line which contains the 866 info.
> 
> * Bibliographic record:
> 
> 245  00  $aNature.
> 260  $a[London, etc.,$bMacmillan Journals ltd.]
> 
> * Holdings record 1 of 4:
> typeOfRecord: y
> encodingLevel: 4
> receiptAcqStatus: 4
> generalRetention: 8
> completeness: 4
> dateOfReport: 00
> nucCode: sel,per
> localLocation: Science & Engineering Library: Periodicals
> callNumber: Q 1
> enumAndChron: ^_aINDEXES 
> v.277(1979)-v.348(1990),v.403-408(2000),v.415(2002)-v.426(2003)
> 
> 
> As you can see, Net::Z3950 only returns the last 866 field.
> 
> So my questions are:
> 1) Has anyone else noticed/experienced this behavior (i.e. 
> only getting the last 866)?  I'm trying to determine if this 
> behavior is unique to how I am implementing/configuring 
> Net::Z3950 and/or if it is ILMS specific.  This is my first 
> time using Net::Z3950, so if I'm doing something wrong, 
> please correct me.
> 
> 2) Is this behavior by design or is it a bug?  According to 
> the MARC standard, the MFHD 866 is repeatable [2].  Please 
> disregard the fact that we have Index holdings in the 866 
> rather than the 868 ...or why we are using multiple 866 even 
> for regular holdings.  Those issues are not under my control.
> 
> 3) If it is a bug, is it in Net::Z3950 or is it in the Z39.50 
> protocol or in the Voyager Z39.50 implementation/API.  (I 
> have limited experience with Z39.50 and the only other client 
> I have, BookWhere, does not appear to offer the "OPAC" record 
> syntax.)  If it is in the Net::Z3950 module can it be fixed?  :-)
> 
> I have browsed the Net-z3950 listserv archive back to 
> September 2003 (when version 0.36, which added support for 
> the OPAC record syntax, was released) and didn't see any 
> mention of this behavior.
> 
> Our software and versions:
>   Net::Z3950 version 0.39 (on Solaris)
>   Our ILMS is Endeavor's Voyager, version 2001.2
> 
> Thanks!
> 
> -- Michael
> 
> [1] The script is designed as an SFX plug-in and was written 
> by David Walker of Cal State San Marcos
> http://library.csusm.edu/csu/sfx/local_holding_chameleon.asp
> [2] MARC 21 Concise Holdings: Textual Holdings Statement 
> Fields (866-868)
> http://www.loc.gov/marc/holdings/echdtext.html
> 
> # Michael Doran, Systems Librarian
> # University of Texas at Arlington
> # 817-272-5326 office
> # 817-688-1926 cell
> # [EMAIL PROTECTED]
> # http://rocky.uta.edu/doran/
> 


Re: Warnings during decode() of raw MARC

2004-08-18 Thread Ed Summers
On Wed, Aug 18, 2004 at 12:56:17PM -0500, Bryan Baldus wrote:
> Ok, I'll try to provide a test file. However, since I've been using MacPerl
> (and Windows), I don't usually deal with test files (the usual 
> perl Makefile.PL
>make
>make test
>make install
> doesn't usually work for me, though occasionally I will run the tests
> one-by-one). 

How do you typically do the install? MARC::Record is included at the
ActiveState PPM Repository, so it should do these things on a Windows
platform...assuming nmake or some sort of make variant is being used. 

> So, this .t file, should it just be a raw USMARC/MARC21 format
> record exhibiting one of the problems that generates a warning during the
> decode process (probably not, since you said .t rather than .usmarc), or
> does the .t file need to include calling code, and if so, is the Test module
> tutorial (Test::Tutorial) the best place to look for creating such a .t
> file?

Take a look in the t/ subdirectory of the MARC::Record package. These
are the tests that get run automatically during a 'make test'. There are
lots of resources available for learning how to write perl unit tests
[1,2,3].

After your email this morning I added some tests to expose the bug in 
MARC::Batch::next(). 

## create a batch object for a file that contains a record
## with a bad indicator.
my $batch = MARC::Batch->new( 'USMARC', 't/badind.usmarc' );
$batch->warnings_off();
$batch->strict_off();
my $r = $batch->next();

## check the warnings on the batch
my @warnings = $batch->warnings();
is( @warnings, 1, 'got expected amt of warnings off the batch' );
like( $warnings[0], qr/^Invalid indicator/, 
'got expected err msg off the batch' );

## same exact warning should be available on the record 
@warnings = $r->warnings();
is( @warnings, 1, 'got expected amt of warnings off the record' );
like( $warnings[0], qr/^Invalid indicator/, 
'got expected err msg off the record' );

Basically just read in a record with some messed up indicators, and made
sure that it could get the warning from the batch object AND the record
object. Sure enough it failed, and after I applied my change it worked
fine. Then I ran the entire test suite and it ran fine as well. Having a
test suite like this lets you make changes without losing too much sleep
worrying about the rippling effects of changes.

//Ed

[1] http://www.wgz.com/chromatic/perl/IntroTestMore.pdf 
[2] http://magnonel.guild.net/~schwern/talks/Test_Tutorial/Test-Tutorial.pdf
[3] http://www.petdance.com/perl/automated-testing/ 


Re: delete_subfields()

2004-08-18 Thread moregan




MARC::Field->as_string() takes a string of subfields rather than an array.
It would be better for as_string() and delete_subfields() to have the same
interface.  Since as_string() is used a lot in production,
delete_subfields() should be the one to change.

Mike O'Regan



   
   
  Ed Summers   
   
  <[EMAIL PROTECTED]>  To:   [EMAIL PROTECTED] 
   
   cc: 
   
  08/17/2004 10:09 Subject:  delete_subfields()
   
  AM   
   
   
   
   
   




Jackie Shieh at Univ of Michigan thought it would be handy to have a
delete_subfields() method on MARC::Field objects. Basically the
method takes a list of subfields to delete, and deletes each one,
returning the total subfields that were removed. If a subfield
repeats all of them are deleted.

$field->delete_subfields( 'z' );

I've committed the new method and a few tests to SourceForge if
anyone is interested in taking a look. [1]

//Ed

[1] http://prdownloads.sourceforge.net/marcpm/MARC-Record-1.4.tar.gz






Re: delete_subfields()

2004-08-18 Thread Andy Lester
On Wed, Aug 18, 2004 at 03:34:11PM -0500, [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote:
> MARC::Field->as_string() takes a string of subfields rather than an array.
> It would be better for as_string() and delete_subfields() to have the same
> interface.  Since as_string() is used a lot in production,
> delete_subfields() should be the one to change.

Or they could be smart enough to Do The Right Thing.

xoa

-- 
Andy Lester => [EMAIL PROTECTED] => www.petdance.com => AIM:petdance


Re: delete_subfields()

2004-08-18 Thread Ed Summers
On Wed, Aug 18, 2004 at 03:34:11PM -0500, [EMAIL PROTECTED] wrote:
> MARC::Field->as_string() takes a string of subfields rather than an array.
> It would be better for as_string() and delete_subfields() to have the same
> interface.  Since as_string() is used a lot in production,
> delete_subfields() should be the one to change.

Nice catch, keeping things consistent is a good thing. I've made the
change and checked in to CVS.

//Ed


RE: Warnings during decode() of raw MARC

2004-08-18 Thread Bryan Baldus
>How do you typically do the install? MARC::Record is included at the
>ActiveState PPM Repository, so it should do these things on a Windows
>platform...assuming nmake or some sort of make variant is being used. 

At home (on the Mac), I just drop the MARC folder in my site_lib folder in
the MacPerl folder (MacPerl adds site_perl to @INC automatically, I
believe). This is after I expand the tar.gz files with Stuffit Expander.
There used to be an installer.pl for Mac, but it no longer works with the
current version of MacPerl (5.8.0a). Since the documentation in MacPerl
seems to indicate that it is not compatible with MakeMaker?, drag-drop
installation seems to be the easiest alternative. To convert line endings, I
use either BBEdit Lite, a 3rd party program, or a script I just wrote that
should convert line endings and change the Type and Creator (to TEXT and
BBEdit).

In Windows, I take the folder from home, convert the line endings from Mac
to DOS, and then drop the MARC folder in C:\Perl\site\lib\. 

This (dragging and dropping) seems to work fine for most stand-alone modules
(where a C compiler is not needed). In some cases, I do look at the
Makefile.PL, for example with MARC::Charset where it was necessary to create
a database file of EastAsian character sets. Of course, once I got that
installed (through drag-dropping), it gave a number of errors (when I ran
the tests), probably because of my operating system (MacOS 9.2.2) not
working well with Unicode?

I do generally try running each of the test files when I first install a new
module, just to make sure they work ok, but I've not usually bothered to
look at how the tests or the Makefile.PL work. This is one reason I haven't
tried to distribute my modules through CPAN.

 
Bryan Baldus
http://home.inwave.com/eija/
(http://home.inwave.com/eija/readme.htm)


Re: Warnings during decode() of raw MARC

2004-08-18 Thread Ed Summers
> ... I've not usually bothered to look at how the tests or the
> Makefile.PL work. This is one reason I haven't tried to distribute my
> modules through CPAN.

What no OS X yet!? The drag and drop trick is what you are stuck with
in MacPerl, and it's kind of a testament to Perl's flexibility that you can 
even do this. However you should consider getting your stuff into the CPAN 
cookie cutter mold if you have the time and energy.  Understanding
ExtUtils::MakeMaker is not necessary (in fact that way lies madness),
but understanding the little that you have to do to get a distribution 
together is worth the effort.

This way you don't have to distribute the code yourself and it is made 
available at hundreds of mirrors around the world; plus you benefit from the 
CPAN tools at large: documentation [1], ticketing [2] and testing [3]
(among others).

Even if you don't send your code to CPAN for the rest of the world to
enjoy, you can benefit from having installers for your code. Installers
come in handy when you need to migrate your code to a new machine, or
when recovering from a failure of some kind [knock on wood].

In general, bundling your code up into installable packages
encourages you to think of your software in terms of units of
functionality (modules), instead of one big mass of interrelated scripts.
Sam Tregar has a book on writing CPAN modules which is a great place to
start learning about CPAN if you are interested [4].

//Ed

[1] http://search.cpan.org
[2] http//rt.cpan.org
[3] http://testers.cpan.org/
[4] http://sam.tregar.com/book.html