On Jun 22, 2006, at 5:34 AM, [EMAIL PROTECTED] wrote:
I'm using MARC::Charset::marc8_to_utf8() v0.95 to transcode some
Library of Congress data to utf8, however I'm finding a problem with
character 'ΓΈ' (hex 0xB2 - lowercase scandinavian o / latin small
letter o with stroke), this character is transcoding to 0xF8 -
which is
not valid utf8 - when it should transcode to 0xC3B8. (According to the
documentation, 0xF8 seems to be the ucs transcoding of this
character).
Is this a bug in MARC::Charset or am I missing something?
Well I tried this out in the debugger with perl 5.8.7 and
MARC::Charset v0.95:
--
main::(-e:1): 1
DB<1> use MARC::Charset qw(marc8_to_utf8)
DB<2> $utf8 = marc8_to_utf8(chr(0xB2));
DB<3> print "works" if $utf8 eq chr(0xF8);
works
--
So it appears to be working fine. Perhaps when you are writing out
your data you aren't preparing the filehandle for utf8? Can you
provide a simple test script that demonstrates the problem so others
can try to replicate?
//Ed
Thanks,
Michael