Re: Working around a UTF8/Unicode encoding problem

2007-05-12 Thread Jason Ronallo
William, U+00B9 is a superscript number one: http://www.fileformat.info/info/unicode/char/00b9/index.htm I've seen this happen to some dollar signs in records that have been mangled. I can also see that this record is broken because the XML entity ' is in a MARC communications format file. Just a

Working around a UTF8/Unicode encoding problem

2007-05-12 Thread William Denton
I have a MARC file [1] that contains a record (ID number 4240822) with a problem. I whipped up this script to have a look at things: #!/usr/local/bin/perl -w my $oldmarc = shift; die "Usage: $0 old.marc" unless defined $oldmarc; use MARC::Batch; my $batch = MARC::Batch->new('USMARC