On Jun 22, 2006, at 5:34 AM, [EMAIL PROTECTED] wrote:
I'm using MARC::Charset::marc8_to_utf8() v0.95 to transcode some
Library of Congress data to utf8, however I'm finding a problem with
character 'ΓΈ' (hex 0xB2 - lowercase scandinavian o / latin small
letter o with stroke), this character is transcoding to 0xF8 - which is
not valid utf8 - when it should transcode to 0xC3B8. (According to the
documentation, 0xF8 seems to be the ucs transcoding of this character).

Is this a bug in MARC::Charset or am I missing something?

Well I tried this out in the debugger with perl 5.8.7 and MARC::Charset v0.95:

--

  main::(-e:1):   1
    DB<1> use MARC::Charset qw(marc8_to_utf8)

    DB<2> $utf8 = marc8_to_utf8(chr(0xB2));

    DB<3> print "works" if $utf8 eq chr(0xF8);
  works

--

So it appears to be working fine. Perhaps when you are writing out your data you aren't preparing the filehandle for utf8? Can you provide a simple test script that demonstrates the problem so others can try to replicate?

//Ed



Thanks,

Michael



Reply via email to