Hi All,

We get a lot of updates from our client libraries, to keep our union cat up
to date.  There isn't a lot of consistancy between the libraries regarding
which ILS they use (they're all small rural libs), or which versions of a
given ILS.  The files of MARC records we get are often... strange.  Things
like missing/extra end-of-field markers, "bad" 008 fields, repetitions of
non-repeatable fields, etc.

Rather than adding a bunch of special cases to our conversion routines, I
thought I'd write a pre-processor to eliminate as many problems as possible
first.  The easiest way to do that is to read each record and then create a
brand new record using its data - letting MARC::Record create it "properly"
rather than trying to fix the original record.

This works extremely well (kudos to Andy and Ed for making my life so much
easier!), except that for some reason, I'm getting an extra x0D x0E after
each new record I print to stdout.  I'm stumped.  Can anyone tell me what
I'm missing here?

#!/usr/bin/perl

use MARC::Batch;

my $cnt = 0;
my $batch = MARC::Batch->new( 'USMARC', @ARGV );
$batch->strict_off();
$batch->warnings_off();

while ( my $oldmarc = $batch->next ) {
   last if $cnt > 5;  # Just for testing....
   $cnt++;

   next unless $oldmarc->title();   # if this is a garbage record, skip it.

   my $oldleader = $oldmarc->leader();

   my $newmarc = new MARC::Record;
   $newmarc->leader( $oldleader );

   my @oldfields = $oldmarc->fields();
   my @newfields = ();

   foreach my $oldfield (@oldfields) {
   my $newfield = undef;
   if ($oldfield->is_control_field()) {
       my $tag = $oldfield->tag();
       my $data = $oldfield->data();
       $newfield = MARC::Field->new( $tag, $data );
   } else {
       my $tag = $oldfield->tag();
       my $ind1 = $oldfield->indicator(1) || ' ';
       my $ind2 = $oldfield->indicator(2) || ' ';

       my @oldsubfields = $oldfield->subfields();
       my @newsubfields = ();
       foreach $oldsubfield (@oldsubfields) {
       push @newsubfields, $oldsubfield->[0];
       push @newsubfields, $oldsubfield->[1];
       }

       $newfield = MARC::Field->new( $tag, $ind1, $ind2, @newsubfields );
   }
   if ($newfield) {
       push @newfields, $newfield;
   }
   }
   $newmarc->insert_fields_ordered( @newfields );
   print $newmarc->as_usmarc();
}

Of course, if I just run this routine twice (once on the original file, and
once on the output), it eliminates those "extra" empty non-records.  But I'd
really like to figure out why they are getting in there in the first place
(that is, why the extra x0D x0E gets written).

Thanks, and sorry for the long post,
-David

Reply via email to