Do your records have the utf8 encoding byte set  in the LDR? (Byte 9 should
be 'a' for utf8).

-Tim

Timothy Prettyman
University of Michigan LIbrary/LIT


On Tue, Mar 26, 2013 at 4:22 PM, Eric Lease Morgan <emor...@nd.edu> wrote:

>
> For the life of me I can't figure out how to do reading and writing of
> UTF-8 with MARC::Batch.
>
> I have a UTF-8 encoded file of MARC records. Dumping the records and
> greping for a particular string illustrates the validity:
>
>   $ marcdump und.marc | grep Sainte-Face
>   und.marc
>   1000 records
>   2000 records
>   3000 records
>   4000 records
>   5000 records
>   6000 records
>   7000 records
>   8000 records
>   9000 records
>   10000 records
>   11000 records
>   12000 records
>   245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
>   610 20 _aArchiconfrérie de la Sainte-Face
>   13000 records
>   $
>
> I then run a Perl script that simply reads each record and dumps it to
> STDOUT. Notice how I define both my input and output as UTF-8:
>
>   #!/shared/perl/current/bin/perl
>
>   # configure
>   use constant MARC => './und.marc';
>
>   # require
>   use strict;
>   use MARC::Batch;
>
>   # initialize
>   binmode ( MARC, ":utf8" );
>   my $batch = MARC::Batch->new( 'USMARC', MARC );
>   $batch->strict_off;
>   $batch->warnings_off;
>   binmode( STDOUT, ":utf8" );
>
>   # read & write
>   while ( my $marc = $batch->next ) { print $marc->as_usmarc }
>
>   # done
>   exit;
>
> But my output is munged:
>
>   $ ./marc.pl > und.mrc
>   $ marcdump und.mrc | grep Sainte-Face
>   und.mrc
>   1000 records
>   2000 records
>   3000 records
>   4000 records
>   5000 records
>   6000 records
>   7000 records
>   8000 records
>   9000 records
>   10000 records
>   11000 records
>   12000 records
>   245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
>   610    _aArchiconfrérie de la Sainte-Face
>   13000 records
>   $
>
> What am I doing wrong!?
>
> --
> Eric Lease Morgan
> University of Notre Dame
>
> 574/631-8604
>
>
>
>

Reply via email to