RE: UTF-8 encoding errors

Bryan Baldus Wed, 07 Mar 2007 14:58:05 -0800

On Wednesday, March 07, 2007 2:34 PM, Ron Davies wrote:
>When I do this I get a number of error messages such as :
>"\x{00ce}" does not map to utf8 at myprogram.pl line xxx.
>and in the output file instead of the correct character there is a hex 
>encoding. This happens with Greek but also perfectly ordinary Latin 
>characters.


I can't offer any advice, but I am experiencing what may be similar
difficulties. I finally had a chance to get MARC::Charset and
MARC::File::XML installed and working, so I could try out xml2marc and
marc2xml. After creating a test record containing a field with diacritics, I
tried using marc2xml followed by xml2marc, hoping to end up with records
matching the original. marc2xml appears to have successfully translated the
raw MARC into MARCXML (it left the leader unchanged--no update to the record
length (though it did set byte 9 to 'a' for Unicode). Unfortunately,
attempting to use xml2marc on any of the .xml files I have results in an
empty file. In some cases I get a message:

"Cannot decode string with wide characters at C:/Perl/lib/Encode.pm line
184, <GEN1> line 1."

In other cases, I get no error messages, but still have an empty file. I
have tried a number of variations in the starting file: marc8.mrc->utf8.xml;
utf8.mrc->utf8.xml, MarcEdit-produced .xml->Perl-produced .mrc.

My system: Windows XP; ActivePerl  v5.8.2 built for MSWin32-x86-multi-thread
(Binary build 808)
MARC::Record: 2.0
Encode: 1.9801

Are these problems related to the age of my Perl or Encode?

(If I remember correctly, before switching to MARC::Record 2.0, using
MARC::Record 1.39_1 and xml2marc resulted in records being output but the
field containing diacritics was mangled/deleted/replaced with bad data.)

Thank you for your assistance,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija

RE: UTF-8 encoding errors

Reply via email to