Warnings during decode() of raw MARC
I'm probably missing something obvious, but I have been unsuccessful in trying to capture the warnings reported by MARC::Record that are set by MARC::File::USMARC->decode(). Is there a simple way to store the warnings reported during the decode() process (using a MARC::Record or MARC::Batch object)? When MARC::Record/Batch call next() on a record with problems reported during decoding, the warning (for example warnings about indicators being forced to blanks, or about entirely empty subfields) prints to the screen (STDERR?), but I can't seem to directly capture those warnings to an array (for later reporting). I have tried using: #my $batch = MARC::Batch->new('USMARC', "$inputfile"); my @basewarnings = $batch->warnings(); or #$record = $batch->next() my @basewarnings = $record->warnings(); push @haswarnings, @basewarnings; Both seem to fail to capture the warnings reported by MARC::File::USMARC. As a workaround, I wrote rawanddecodedscan.pl [1], which works well enough as a stand-alone program. However, now I would like to integrate those warnings into the warnings reported by my lintallchecks.pl (and related MARC::Lint-based) programs [2]. In the Raw and Decoded Scan program, I got around the problem by opening the file of MARC records twice and then going through each record simulaneously, decoding one manually and the other using MARC::Record. Thank you for your assistance, Bryan Baldus http://home.inwave.com/eija/ [EMAIL PROTECTED] ## [1] Raw and Decoded Scan script: http://home.inwave.com/eija/fullrecscripts/rawanddecodedscan.txt [2] Lint All Checks script: http://home.inwave.com/eija/fullrecscripts/lintallchecks.txt
RE: Warnings during decode() of raw MARC
> From: Bryan Baldus [mailto:[EMAIL PROTECTED] > Sent: 18 August, 2004 09:24 > Subject: Warnings during decode() of raw MARC > > I'm probably missing something obvious, but I have been > unsuccessful in trying to capture the warnings reported by > MARC::Record that are set by MARC::File::USMARC->decode(). Is > there a simple way to store the warnings reported during the > decode() process (using a MARC::Record or MARC::Batch object)? > How about this technique: #!perl package main; sub out { print STDERR "ERR: Error 1\n"; print STDERR "ERR: Error 2\n"; print STDERR "ERR: Error 3\n"; return; } sub main { my @errs = (); open SAVERR, ">&STDERR"; open STDERR, ">errors.txt"; open GETERR, ") { push(@errs,$_); } close(GETERR); open STDERR, ">&SAVERR"; print STDOUT scalar(@errs),"\n"; return 0; } exit main::main;
Net::Z3950, OPAC record syntax & multiple MFHD 866
Please excuse the cross-posting (perl4lib & Net-z3950). I am working with a Perl script designed to query our catalog via Net::Z3950 and retrieve a journal record. The OPAC record syntax is specified because the ultimate point of the script [1] is to parse the journal holdings to determine if a particular year is owned by our library. Our holdings (MFHD) records often contain multiple 866 fields (which contain the actual holdings info); however, Net::Z3950 only returns the *last* 866 from a MFHD record, thereby giving an incomplete list of holdings. Below is the relevant code: use Net::Z3950; $issn = '0028-0836' $query = '@attr 1=8 ' . $issn; $target = 'pulse.uta.edu'; $port = 7099; $database = 'pulse'; $recordSyntax = 'OPAC'; $conn = new Net::Z3950::Connection($target, $port, databaseName => $database); $rs = $conn->search(-prefix => $query); $rs->option(preferredRecordSyntax => $recordSyntax); for ( $i = 1; $i <= $rs->size(); $i++ ) { $rec = $rs->record($i); $marc = $rec->render(); print "$marc"; } If I search for the journal Nature (ISSN 0028-0836) which in our catalog has these multiple 866s in the first holdings record: 866 0 _av.253(1975)-v.344(1990:Apr.), 866 0 _av.345(1990)-v.426(2003:Nov.20), 866 0 _av.426(2003:Dec.)-v.429(2004:May) 866 0 _aINDEXES v.277(1979)-v.348(1990),v.403-408(2000),v.415(2002)-v.426(2003) ...I get this MARC data returned by Net::Z3950. Note the "enumAndChron" line which contains the 866 info. * Bibliographic record: 245 00 $aNature. 260 $a[London, etc.,$bMacmillan Journals ltd.] * Holdings record 1 of 4: typeOfRecord: y encodingLevel: 4 receiptAcqStatus: 4 generalRetention: 8 completeness: 4 dateOfReport: 00 nucCode: sel,per localLocation: Science & Engineering Library: Periodicals callNumber: Q 1 enumAndChron: ^_aINDEXES v.277(1979)-v.348(1990),v.403-408(2000),v.415(2002)-v.426(2003) As you can see, Net::Z3950 only returns the last 866 field. So my questions are: 1) Has anyone else noticed/experienced this behavior (i.e. only getting the last 866)? I'm trying to determine if this behavior is unique to how I am implementing/configuring Net::Z3950 and/or if it is ILMS specific. This is my first time using Net::Z3950, so if I'm doing something wrong, please correct me. 2) Is this behavior by design or is it a bug? According to the MARC standard, the MFHD 866 is repeatable [2]. Please disregard the fact that we have Index holdings in the 866 rather than the 868 ...or why we are using multiple 866 even for regular holdings. Those issues are not under my control. 3) If it is a bug, is it in Net::Z3950 or is it in the Z39.50 protocol or in the Voyager Z39.50 implementation/API. (I have limited experience with Z39.50 and the only other client I have, BookWhere, does not appear to offer the "OPAC" record syntax.) If it is in the Net::Z3950 module can it be fixed? :-) I have browsed the Net-z3950 listserv archive back to September 2003 (when version 0.36, which added support for the OPAC record syntax, was released) and didn't see any mention of this behavior. Our software and versions: Net::Z3950 version 0.39 (on Solaris) Our ILMS is Endeavor's Voyager, version 2001.2 Thanks! -- Michael [1] The script is designed as an SFX plug-in and was written by David Walker of Cal State San Marcos http://library.csusm.edu/csu/sfx/local_holding_chameleon.asp [2] MARC 21 Concise Holdings: Textual Holdings Statement Fields (866-868) http://www.loc.gov/marc/holdings/echdtext.html # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 cell # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/
Re: Warnings during decode() of raw MARC
On Wed, Aug 18, 2004 at 08:23:59AM -0500, Bryan Baldus wrote: > Both seem to fail to capture the warnings reported by MARC::File::USMARC. There appears to be a bug in MARC::Batch::next() code at line 123 which extracts the warnings from the newly instantiated MARC::Record object and stuffs them into the MARC::Batch object so that they are available at that level. my @warnings = $rec->warnings(); The bug is that calling warnings() clears the warnings storage in the MARC::Record object as a side effect. MARC::Batch should probably side step calling warnings() and dig into the object directly...or there should be another method that doesn't zap the storage. Let me see if I can duplicate this problem in a test, and then see if the fix actually works. If you can provide a .t file for the MARC::Record distribution that would be handy too :-) //Ed
RE: Net::Z3950, OPAC record syntax & multiple MFHD 866 - SOLVED
I have been informed that this is a Voyager ILMS Z39.50 server bug. (Thanks Sandy!) Sorry for the false alarm... didn't mean to cast any aspersions on Net::Z3950! -- Michael > -Original Message- > From: Michael D Doran > Sent: Wednesday, August 18, 2004 11:23 AM > To: '[EMAIL PROTECTED]'; [EMAIL PROTECTED] > Subject: Net::Z3950, OPAC record syntax & multiple MFHD 866 > > Please excuse the cross-posting (perl4lib & Net-z3950). > > I am working with a Perl script designed to query our catalog > via Net::Z3950 and retrieve a journal record. The OPAC > record syntax is specified because the ultimate point of the > script [1] is to parse the journal holdings to determine if a > particular year is owned by our library. Our holdings (MFHD) > records often contain multiple 866 fields (which contain the > actual holdings info); however, Net::Z3950 only returns the > *last* 866 from a MFHD record, thereby giving an incomplete > list of holdings. > > Below is the relevant code: > > use Net::Z3950; > $issn = '0028-0836' > $query = '@attr 1=8 ' . $issn; > $target = 'pulse.uta.edu'; > $port = 7099; > $database = 'pulse'; > $recordSyntax = 'OPAC'; > $conn = new Net::Z3950::Connection($target, $port, > databaseName => $database); > $rs = $conn->search(-prefix => $query); > $rs->option(preferredRecordSyntax => $recordSyntax); > for ( $i = 1; $i <= $rs->size(); $i++ ) { > $rec = $rs->record($i); > $marc = $rec->render(); > print "$marc"; > } > > If I search for the journal Nature (ISSN 0028-0836) which in > our catalog has these multiple 866s in the first holdings record: > > 866 0 _av.253(1975)-v.344(1990:Apr.), > 866 0 _av.345(1990)-v.426(2003:Nov.20), > 866 0 _av.426(2003:Dec.)-v.429(2004:May) > 866 0 _aINDEXES > v.277(1979)-v.348(1990),v.403-408(2000),v.415(2002)-v.426(2003) > > ...I get this MARC data returned by Net::Z3950. Note the > "enumAndChron" line which contains the 866 info. > > * Bibliographic record: > > 245 00 $aNature. > 260 $a[London, etc.,$bMacmillan Journals ltd.] > > * Holdings record 1 of 4: > typeOfRecord: y > encodingLevel: 4 > receiptAcqStatus: 4 > generalRetention: 8 > completeness: 4 > dateOfReport: 00 > nucCode: sel,per > localLocation: Science & Engineering Library: Periodicals > callNumber: Q 1 > enumAndChron: ^_aINDEXES > v.277(1979)-v.348(1990),v.403-408(2000),v.415(2002)-v.426(2003) > > > As you can see, Net::Z3950 only returns the last 866 field. > > So my questions are: > 1) Has anyone else noticed/experienced this behavior (i.e. > only getting the last 866)? I'm trying to determine if this > behavior is unique to how I am implementing/configuring > Net::Z3950 and/or if it is ILMS specific. This is my first > time using Net::Z3950, so if I'm doing something wrong, > please correct me. > > 2) Is this behavior by design or is it a bug? According to > the MARC standard, the MFHD 866 is repeatable [2]. Please > disregard the fact that we have Index holdings in the 866 > rather than the 868 ...or why we are using multiple 866 even > for regular holdings. Those issues are not under my control. > > 3) If it is a bug, is it in Net::Z3950 or is it in the Z39.50 > protocol or in the Voyager Z39.50 implementation/API. (I > have limited experience with Z39.50 and the only other client > I have, BookWhere, does not appear to offer the "OPAC" record > syntax.) If it is in the Net::Z3950 module can it be fixed? :-) > > I have browsed the Net-z3950 listserv archive back to > September 2003 (when version 0.36, which added support for > the OPAC record syntax, was released) and didn't see any > mention of this behavior. > > Our software and versions: > Net::Z3950 version 0.39 (on Solaris) > Our ILMS is Endeavor's Voyager, version 2001.2 > > Thanks! > > -- Michael > > [1] The script is designed as an SFX plug-in and was written > by David Walker of Cal State San Marcos > http://library.csusm.edu/csu/sfx/local_holding_chameleon.asp > [2] MARC 21 Concise Holdings: Textual Holdings Statement > Fields (866-868) > http://www.loc.gov/marc/holdings/echdtext.html > > # Michael Doran, Systems Librarian > # University of Texas at Arlington > # 817-272-5326 office > # 817-688-1926 cell > # [EMAIL PROTECTED] > # http://rocky.uta.edu/doran/ >
Re: Warnings during decode() of raw MARC
On Wed, Aug 18, 2004 at 12:56:17PM -0500, Bryan Baldus wrote: > Ok, I'll try to provide a test file. However, since I've been using MacPerl > (and Windows), I don't usually deal with test files (the usual > perl Makefile.PL >make >make test >make install > doesn't usually work for me, though occasionally I will run the tests > one-by-one). How do you typically do the install? MARC::Record is included at the ActiveState PPM Repository, so it should do these things on a Windows platform...assuming nmake or some sort of make variant is being used. > So, this .t file, should it just be a raw USMARC/MARC21 format > record exhibiting one of the problems that generates a warning during the > decode process (probably not, since you said .t rather than .usmarc), or > does the .t file need to include calling code, and if so, is the Test module > tutorial (Test::Tutorial) the best place to look for creating such a .t > file? Take a look in the t/ subdirectory of the MARC::Record package. These are the tests that get run automatically during a 'make test'. There are lots of resources available for learning how to write perl unit tests [1,2,3]. After your email this morning I added some tests to expose the bug in MARC::Batch::next(). ## create a batch object for a file that contains a record ## with a bad indicator. my $batch = MARC::Batch->new( 'USMARC', 't/badind.usmarc' ); $batch->warnings_off(); $batch->strict_off(); my $r = $batch->next(); ## check the warnings on the batch my @warnings = $batch->warnings(); is( @warnings, 1, 'got expected amt of warnings off the batch' ); like( $warnings[0], qr/^Invalid indicator/, 'got expected err msg off the batch' ); ## same exact warning should be available on the record @warnings = $r->warnings(); is( @warnings, 1, 'got expected amt of warnings off the record' ); like( $warnings[0], qr/^Invalid indicator/, 'got expected err msg off the record' ); Basically just read in a record with some messed up indicators, and made sure that it could get the warning from the batch object AND the record object. Sure enough it failed, and after I applied my change it worked fine. Then I ran the entire test suite and it ran fine as well. Having a test suite like this lets you make changes without losing too much sleep worrying about the rippling effects of changes. //Ed [1] http://www.wgz.com/chromatic/perl/IntroTestMore.pdf [2] http://magnonel.guild.net/~schwern/talks/Test_Tutorial/Test-Tutorial.pdf [3] http://www.petdance.com/perl/automated-testing/
Re: delete_subfields()
MARC::Field->as_string() takes a string of subfields rather than an array. It would be better for as_string() and delete_subfields() to have the same interface. Since as_string() is used a lot in production, delete_subfields() should be the one to change. Mike O'Regan Ed Summers <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] cc: 08/17/2004 10:09 Subject: delete_subfields() AM Jackie Shieh at Univ of Michigan thought it would be handy to have a delete_subfields() method on MARC::Field objects. Basically the method takes a list of subfields to delete, and deletes each one, returning the total subfields that were removed. If a subfield repeats all of them are deleted. $field->delete_subfields( 'z' ); I've committed the new method and a few tests to SourceForge if anyone is interested in taking a look. [1] //Ed [1] http://prdownloads.sourceforge.net/marcpm/MARC-Record-1.4.tar.gz
Re: delete_subfields()
On Wed, Aug 18, 2004 at 03:34:11PM -0500, [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote: > MARC::Field->as_string() takes a string of subfields rather than an array. > It would be better for as_string() and delete_subfields() to have the same > interface. Since as_string() is used a lot in production, > delete_subfields() should be the one to change. Or they could be smart enough to Do The Right Thing. xoa -- Andy Lester => [EMAIL PROTECTED] => www.petdance.com => AIM:petdance
Re: delete_subfields()
On Wed, Aug 18, 2004 at 03:34:11PM -0500, [EMAIL PROTECTED] wrote: > MARC::Field->as_string() takes a string of subfields rather than an array. > It would be better for as_string() and delete_subfields() to have the same > interface. Since as_string() is used a lot in production, > delete_subfields() should be the one to change. Nice catch, keeping things consistent is a good thing. I've made the change and checked in to CVS. //Ed
RE: Warnings during decode() of raw MARC
>How do you typically do the install? MARC::Record is included at the >ActiveState PPM Repository, so it should do these things on a Windows >platform...assuming nmake or some sort of make variant is being used. At home (on the Mac), I just drop the MARC folder in my site_lib folder in the MacPerl folder (MacPerl adds site_perl to @INC automatically, I believe). This is after I expand the tar.gz files with Stuffit Expander. There used to be an installer.pl for Mac, but it no longer works with the current version of MacPerl (5.8.0a). Since the documentation in MacPerl seems to indicate that it is not compatible with MakeMaker?, drag-drop installation seems to be the easiest alternative. To convert line endings, I use either BBEdit Lite, a 3rd party program, or a script I just wrote that should convert line endings and change the Type and Creator (to TEXT and BBEdit). In Windows, I take the folder from home, convert the line endings from Mac to DOS, and then drop the MARC folder in C:\Perl\site\lib\. This (dragging and dropping) seems to work fine for most stand-alone modules (where a C compiler is not needed). In some cases, I do look at the Makefile.PL, for example with MARC::Charset where it was necessary to create a database file of EastAsian character sets. Of course, once I got that installed (through drag-dropping), it gave a number of errors (when I ran the tests), probably because of my operating system (MacOS 9.2.2) not working well with Unicode? I do generally try running each of the test files when I first install a new module, just to make sure they work ok, but I've not usually bothered to look at how the tests or the Makefile.PL work. This is one reason I haven't tried to distribute my modules through CPAN. Bryan Baldus http://home.inwave.com/eija/ (http://home.inwave.com/eija/readme.htm)
Re: Warnings during decode() of raw MARC
> ... I've not usually bothered to look at how the tests or the > Makefile.PL work. This is one reason I haven't tried to distribute my > modules through CPAN. What no OS X yet!? The drag and drop trick is what you are stuck with in MacPerl, and it's kind of a testament to Perl's flexibility that you can even do this. However you should consider getting your stuff into the CPAN cookie cutter mold if you have the time and energy. Understanding ExtUtils::MakeMaker is not necessary (in fact that way lies madness), but understanding the little that you have to do to get a distribution together is worth the effort. This way you don't have to distribute the code yourself and it is made available at hundreds of mirrors around the world; plus you benefit from the CPAN tools at large: documentation [1], ticketing [2] and testing [3] (among others). Even if you don't send your code to CPAN for the rest of the world to enjoy, you can benefit from having installers for your code. Installers come in handy when you need to migrate your code to a new machine, or when recovering from a failure of some kind [knock on wood]. In general, bundling your code up into installable packages encourages you to think of your software in terms of units of functionality (modules), instead of one big mass of interrelated scripts. Sam Tregar has a book on writing CPAN modules which is a great place to start learning about CPAN if you are interested [4]. //Ed [1] http://search.cpan.org [2] http//rt.cpan.org [3] http://testers.cpan.org/ [4] http://sam.tregar.com/book.html