William, U+00B9 is a superscript number one: http://www.fileformat.info/info/unicode/char/00b9/index.htm I've seen this happen to some dollar signs in records that have been mangled. I can also see that this record is broken because the XML entity ' is in a MARC communications format file.
Just a hunch: Did you by any chance use your Ruby script to grab this record? If so a bug was recently fixed in ruby-marc for the conversion of entities when saving in communications format. Upgrade the ruby-marc gem, grab the record again and see what happens then. If you're using Perl and dealing with XML records you might also upgrade MARC::File::XML which fixes some character encoding issues. Latest is recently up on cpan and fixed some things for me. It also seems though I had to upgrade another perl XML module from cpan to get it working correctly. Another shortcut that might help is to change leader byte 9 from 'a' to ' ' (blank) and it might work just fine for your purposes. I just did this in vim and it read in the whole file ok. Lately I've been dealing a lot recently with my own character encoding issues, unicode and ruby support for unicode (and a healthy dose of perl's M:F:X as well). --Jason On 5/12/07, William Denton <[EMAIL PROTECTED]> wrote:
I have a MARC file [1] that contains a record (ID number 4240822) with a problem. I whipped up this script to have a look at things: #!/usr/local/bin/perl -w my $oldmarc = shift; die "Usage: $0 old.marc" unless defined $oldmarc; use MARC::Batch; my $batch = MARC::Batch->new('USMARC', $oldmarc); $batch->strict_off(); while (my $record = $batch -> next()) { print $record->title(), "\n"; } I set strict_off so it will "continue after it has encountered what it believes to be bad MARC data," which is handy, but its fails on an error outside that, from the Encode module: $ ./test.pl pride-and-prejudice.marc Complete novels of Jane Austen utf8 "\xB9" does not map to Unicode at /usr/local/lib/perl5/5.8.8/mach/Encode.pm line 166. Is there any easy way around this that would let me still use MARC::Batch? I'm not sure how I might trap errors with eval here. I can step through the records directly, but the Batch thing is pretty handy. Thanks, Bill [1] http://www.frbr.org/files/pride-and-prejudice.marc -- William Denton, Toronto : www.miskatonic.org www.frbr.org www.openfrbr.org