Do your records have the utf8 encoding byte set in the LDR? (Byte 9 should be 'a' for utf8).
-Tim Timothy Prettyman University of Michigan LIbrary/LIT On Tue, Mar 26, 2013 at 4:22 PM, Eric Lease Morgan <emor...@nd.edu> wrote: > > For the life of me I can't figure out how to do reading and writing of > UTF-8 with MARC::Batch. > > I have a UTF-8 encoded file of MARC records. Dumping the records and > greping for a particular string illustrates the validity: > > $ marcdump und.marc | grep Sainte-Face > und.marc > 1000 records > 2000 records > 3000 records > 4000 records > 5000 records > 6000 records > 7000 records > 8000 records > 9000 records > 10000 records > 11000 records > 12000 records > 245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face > 610 20 _aArchiconfrérie de la Sainte-Face > 13000 records > $ > > I then run a Perl script that simply reads each record and dumps it to > STDOUT. Notice how I define both my input and output as UTF-8: > > #!/shared/perl/current/bin/perl > > # configure > use constant MARC => './und.marc'; > > # require > use strict; > use MARC::Batch; > > # initialize > binmode ( MARC, ":utf8" ); > my $batch = MARC::Batch->new( 'USMARC', MARC ); > $batch->strict_off; > $batch->warnings_off; > binmode( STDOUT, ":utf8" ); > > # read & write > while ( my $marc = $batch->next ) { print $marc->as_usmarc } > > # done > exit; > > But my output is munged: > > $ ./marc.pl > und.mrc > $ marcdump und.mrc | grep Sainte-Face > und.mrc > 1000 records > 2000 records > 3000 records > 4000 records > 5000 records > 6000 records > 7000 records > 8000 records > 9000 records > 10000 records > 11000 records > 12000 records > 245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face > 610 _aArchiconfrérie de la Sainte-Face > 13000 records > $ > > What am I doing wrong!? > > -- > Eric Lease Morgan > University of Notre Dame > > 574/631-8604 > > > >