Example marc record is here:
http://www.mediafire.com/file/u5cxkrfwh9ew09z/example.zip
When I process the record above in perl 5.8, MARC::Record version 1.38, and
Encode.pm version 2.12, the record comes out fine.
When I use perl 5.10, MARC::Record version 2.0.0, and Encode.pm 2.40 the
record comes out corrupted and MARC::Record will no longer read the result.
The problem is with a Unicode character (big surprise). The earlier version
leaves the \xC3A1 character intact, the later version changes it to \xE1
which is invalid. I've read as many of the perl4lib messages on the subject
of UTF-8 as I could but my eyes are spinning. I'm hoping by including a
complete but simple perl program and making a MARC record available that
somebody can explain to me in detail what is going on. My inclination is to
simply revert to the earlier version of perl but perhaps if I really
understood the issue that may not be necessary.
Here is the test program I use:
use MARC::Batch;
my $batch = new MARC::Batch('USMARC', $ARGV[0]);
$batch->strict_off ();
$batch->warnings_off ();
#binmode( STDOUT, ':utf8' );
my $record = $batch->next;
print $record->as_usmarc;
Run the program on the record, then run it again on the output and the
second time perl quits with an error:
utf8 "\xE1" does not map to Unicode at Encode.pm line 174.
That should not happen.
Why the different behavior with the different versions? I can't see
anything wrong with the original record - it's valid UTF8 as far as I can
tell. Leader byte 9 is correctly set to 'a'. Uncommenting the binmode line
seems to work - the character is output unchanged as is supposed to happen.
The problem is my record batches are a mixture of UTF8 and MARC8 and
explicitly setting binmode screws things up. I need a solution that
transparently handles a mix of record encodings.
I rather suspect the problem is with Encode.pm and not MARC perl but I
can't be sure. It also may be due to the way perl handles IO between
version 5.8 and 5.10. BTW the problem happens on Windows and Unix.
Thanks for any advice you can give me,
Al