I have updated my MARC related Perl modules and scripts again [1]. Most notable this time, in addition to the updates to BBMARC.pm and Lintadditions.pm, is the new module, Errorchecks.pm (MARC::Errorchecks) [2]. It has an associated calling program, lintallchecks.pl [3]. Both are described below, after the questions.
Associated with the updates, I have a few questions/problems. Question 1. As part of my updates to [MARC::] Lintadditions.pm [4] and [MARC::] BBMARC.pm[5], I added validation checks against MARC code lists for languages, geographic areas, and countries. During the validation, for each field, I read data from the end of the module into an array (with readcodedata(), in both BBMARC and Lintadditions), and then use grep to search for a match, first in the valid codes, then the invalid codes. This seems like it imposes a significant amount of work, calling the readcodedata() subroutine every time I call validate008() or check_043 (or check_041 once it is completed) to populate the valid and invalid code arrays. I could make those arrays global variables, but if so, what is the best place to put them? Is there an easier/more efficient way to handle these validation checks? -------------------------------------------------------------- Question 2: My new module, [MARC::] Errorchecks.pm has a subroutine, check_all_subs($record), which is designed to call all of the checking subroutines in the module, compile a list/array of warnings/errors, and return the array reference. For all of the checks except check_003 and check_010, the following works: push @errorstoreturn, (@{[subroutine_name]($record)}); ([subroutine_name] being replaced with the appropriate call). With check_003 and check_010, I get errors relating to uninitialized use of empty array references. I believe this has to do with next-like returns from those subroutines, such as: unless (($record->field('010')) && ($record->field('010')->subfield('a'))) {return;} As a workaround, I used the following: my $errorsin003 = check_003($record); push @errorstoreturn, @$errorsin003 if ($errorsin003); Is this what I should do for all of my calls? Should I be returning something in check_003? If I return a defined/non-zero value, it seems like my @errorstoreturn array will contain a number of extra, non-error values. -------------------------------------------------------------- Question 3: With all of the checks, using lintchecksall.pl, which calls everything in MARC::Lint, MARC::Lintadditions, and MARC::Errorchecks, the program runs slow. Without rewriting the code as object-oriented (which, with my limited experience and knowledge is not currently anticipated any time soon), what sort of optimizations could I make? -------------------------------------------------------------- Thank you for any suggestions you might have. -------------------------------------------------------------- As stated above and on my home page, I have updated the following: New module: Errorchecks.pm (MARC::Errorchecks): Collection of error checking subroutines similar to MARC::Lint and MARC::Lintadditions. This is currently version 0.95 due to problems with the subroutine calls to check_003 and check_010. Warnings by the interpreter indicate use of uninitialized Array references (probably when the program gets to a record without one of those fields). This module will be updated with additional subroutines, similar to the way Lintadditions is updated. It is mainly for checking fields which require data from other parts of the record (Lint's check_xxx subroutines seem to be limited to single-field checking). Associated script for using MARC::Errorchecks: lintallchecks.txt. This can replace most of the error checking scripts, along with the checking portion of the cleanup full record scripts. It should also work without changes as Errorchecks.pm is updated with new subroutines. Changes to my main modules: Lintadditions.pm: version 1.01: Updated June 17, 2004. Released June 20, 2004. -Added validation of 043 against GAC list. -Added check_082. -Added checks for $b, $h, $n, and $p in 245. -Other changes/fixes. BBMARC.pm: Version 1.04: Updated June 16, 2004, released June 20, 2004 -Updated as_formatted2() to work with MARC::Record 1.38 (is_control_field() instead of is_control_tag() -Fixed bug in validate008 for visual materials running time (hypen was not escaped, so it was being interpreted as a range indicator). -Added parse008date($) to allow user to enter yymmdd and get yyyy\tmm\tdd\t$error string back (for other uses). -Added DATA containing codes from the MARC lists for Countries, Geographic Areas, and Languages, to 2003. Each code set is separated by tabs, and Obsolete codes are given following each set of valid codes, in the same format. -Added readcodedata() subroutine for reading in the data and returning the data in an array for use by validation code, such as in validate008() -Modified validate008 subroutine to use the DATA to validate languageand country codes. Version 1.03: Updated June 10, not released. -Contained many of the changes in 1.04, but 1.04 contains the update to validate008, so I wanted a new version. ---------------------------------------------------------------------------- ------ [1] My home page: http://home.inwave.com/eija [2] Link to Errorchecks current version: http://home.inwave.com/eija/bryanmodules/MARC-Errorchecks-0.95/Errorchecks.p m.txt (try http://home.inwave.com/eija/bryanmodules/ if the above fails) [3] lintallchecks.pl: http://home.inwave.com/eija/fullrecscripts/lintallchecks.txt [4] Link to Lintadditions current version: http://home.inwave.com/eija/bryanmodules/MARC-Lintadditions-1.01/Linta dditions.pm.txt (try http://home.inwave.com/eija/bryanmodules/ if the above fails) [5] Link to BBMARC current version: http://home.inwave.com/eija/bryanmodules/MARC-BBMARC-1.04/BBMARC.PM.txt (try http://home.inwave.com/eija/bryanmodules/ if the above fails) I welcome any suggestions, questions, and comments (to this address, or to that listed on my site). Thank you, Bryan Baldus Cataloger Quality Books Inc. [EMAIL PROTECTED] http://home.inwave.com/eija