William,
U+00B9 is a superscript number one:
http://www.fileformat.info/info/unicode/char/00b9/index.htm
I've seen this happen to some dollar signs in records that have been
mangled. I can also see that this record is broken because the XML
entity ' is in a MARC communications format file.

Just a hunch: Did you by any chance use your Ruby script to grab this
record? If so a bug was recently fixed in ruby-marc for the conversion
of entities when saving in communications format. Upgrade the
ruby-marc gem, grab the record again and see what happens then.

If you're using Perl and dealing with XML records you might also
upgrade MARC::File::XML which fixes some character encoding issues.
Latest is recently up on cpan and fixed some things for me. It also
seems though I had to upgrade another perl XML module from cpan to get
it working correctly.

Another shortcut that might help is to change leader byte 9 from 'a'
to ' ' (blank) and it might work just fine for your purposes. I just
did this in vim and it read in the whole file ok.

Lately I've been dealing a lot recently with my own character encoding
issues, unicode and ruby support for unicode (and a healthy dose of
perl's M:F:X as well).

--Jason

On 5/12/07, William Denton <[EMAIL PROTECTED]> wrote:
I have a MARC file [1] that contains a record (ID number 4240822) with a
problem.  I whipped up this script to have a look at things:

   #!/usr/local/bin/perl -w

   my $oldmarc  = shift;
   die "Usage: $0 old.marc" unless defined $oldmarc;

   use MARC::Batch;
   my $batch = MARC::Batch->new('USMARC', $oldmarc);
   $batch->strict_off();

   while (my $record = $batch -> next()) {
       print $record->title(), "\n";
   }

I set strict_off so it will "continue after it has encountered what it
believes to be bad MARC data," which is handy, but its fails on an error
outside that, from the Encode module:

$ ./test.pl pride-and-prejudice.marc
Complete novels of Jane Austen
utf8 "\xB9" does not map to Unicode at 
/usr/local/lib/perl5/5.8.8/mach/Encode.pm line 166.

Is there any easy way around this that would let me still use MARC::Batch?
I'm not sure how I might trap errors with eval here.  I can step through
the records directly, but the Batch thing is pretty handy.

Thanks,

Bill

[1] http://www.frbr.org/files/pride-and-prejudice.marc
--
William Denton, Toronto : www.miskatonic.org www.frbr.org www.openfrbr.org

Reply via email to