Example marc record is here: http://www.mediafire.com/file/u5cxkrfwh9ew09z/example.zip

When I process the record above in perl 5.8, MARC::Record version 1.38, and Encode.pm version 2.12, the record comes out fine.

When I use perl 5.10, MARC::Record version 2.0.0, and Encode.pm 2.40 the record comes out corrupted and MARC::Record will no longer read the result.

The problem is with a Unicode character (big surprise). The earlier version leaves the \xC3A1 character intact, the later version changes it to \xE1 which is invalid. I've read as many of the perl4lib messages on the subject of UTF-8 as I could but my eyes are spinning. I'm hoping by including a complete but simple perl program and making a MARC record available that somebody can explain to me in detail what is going on. My inclination is to simply revert to the earlier version of perl but perhaps if I really understood the issue that may not be necessary.

Here is the test program I use:

use MARC::Batch;
my $batch = new MARC::Batch('USMARC', $ARGV[0]);
$batch->strict_off ();
$batch->warnings_off ();
#binmode( STDOUT, ':utf8' );
my $record = $batch->next;
print $record->as_usmarc;

Run the program on the record, then run it again on the output and the second time perl quits with an error:

utf8 "\xE1" does not map to Unicode at Encode.pm line 174.

That should not happen.

Why the different behavior with the different versions? I can't see anything wrong with the original record - it's valid UTF8 as far as I can tell. Leader byte 9 is correctly set to 'a'. Uncommenting the binmode line seems to work - the character is output unchanged as is supposed to happen. The problem is my record batches are a mixture of UTF8 and MARC8 and explicitly setting binmode screws things up. I need a solution that transparently handles a mix of record encodings.

I rather suspect the problem is with Encode.pm and not MARC perl but I can't be sure. It also may be due to the way perl handles IO between version 5.8 and 5.10. BTW the problem happens on Windows and Unix.

Thanks for any advice you can give me,

Al

Reply via email to