RE: Working around a UTF8/Unicode encoding problem

2007-05-15 Thread Doran, Michael D
onday, May 14, 2007 9:56 AM > To: perl4lib@perl.org > Subject: RE: Working around a UTF8/Unicode encoding problem > > > From: Jason Ronallo [mailto:[EMAIL PROTECTED] > > Sent: 12 May, 2007 16:52 > > To: William Denton > > Cc: perl4lib@perl.org > > Subject: Re: Wor

Re: Working around a UTF8/Unicode encoding problem

2007-05-14 Thread William Denton
On 12 May 2007, Jason Ronallo wrote: Just a hunch: Did you by any chance use your Ruby script to grab this record? If so a bug was recently fixed in ruby-marc for the conversion of entities when saving in communications format. Upgrade the ruby-marc gem, grab the record again and see what happen

RE: Working around a UTF8/Unicode encoding problem

2007-05-14 Thread Houghton,Andrew
> From: Jason Ronallo [mailto:[EMAIL PROTECTED] > Sent: 12 May, 2007 16:52 > To: William Denton > Cc: perl4lib@perl.org > Subject: Re: Working around a UTF8/Unicode encoding problem > > I can also see that this record is broken > because the XML entity ' is in a MA

Re: Working around a UTF8/Unicode encoding problem

2007-05-12 Thread Jason Ronallo
William, U+00B9 is a superscript number one: http://www.fileformat.info/info/unicode/char/00b9/index.htm I've seen this happen to some dollar signs in records that have been mangled. I can also see that this record is broken because the XML entity ' is in a MARC communications format file. Just a

Working around a UTF8/Unicode encoding problem

2007-05-12 Thread William Denton
I have a MARC file [1] that contains a record (ID number 4240822) with a problem. I whipped up this script to have a look at things: #!/usr/local/bin/perl -w my $oldmarc = shift; die "Usage: $0 old.marc" unless defined $oldmarc; use MARC::Batch; my $batch = MARC::Batch->new('USMARC