Unless I'm very much mistaken, Chris's code is outputting UTF-8 to
the terminal, not MARC-8.
The key is to find a terminal program that correctly displays UTF-8.
I doubt you'll have any trouble finding one -- for example, there
are at least two for Mac OS X alone (Terminal.app and iTerm).
Depending on your platform, freshmeat.net or tucows.com may be the
place to go. This thread from the linux-utf8 list may also be
helpful (I googled for 'terminal UTF-8'):
http://mail.nl.linux.org/linux-utf8/2003-07/msg00231.html
Paul.
On Thursday, July 1, 2004, at 11:22 AM, Houghton,Andrew wrote:
From: Christopher Morgan [mailto:[EMAIL PROTECTED]
Sent: 01 July, 2004 10:50
Subject: Displaying diacritics in a terminal vs. a browser
I use the $cs->to_utf8 conversion from MARC::Charset to
display MARC Authority records in a browser, and the
diacritics display properly there.
But they don't display properly via SDTOUT in my terminal
window (I get two characters instead of one -- one with the
letter and one with the accent mark). Am I doing something
wrong? I'm using:
binmode (STDOUT, ":utf8");
Is there any way around this problem, or is it a limitation
of terminal displays?
I'm not sure what MARC::Charset does internally, but MARC-8
defines the diacritic separate from the base character. So
even using binmode(STDOUT,":utf8") will produce two characters,
one for the base character followed by the diacritic. If you
want them combined then you need to combine them.
It just so happens that I have recently been converting MARC-XML
to RDF. The RDF specification mandates Unicode Normal form C,
which means that the base character and the diacritic are
combined. MARC-XML uses Unicode Normal form D, which means that
the base character is separate from the diacritic. So I hacked
together some Perl scripts to convert Unicode NFD <-> Unicode NFC.
The scripts require Perl 5.8.0.
I was talking with a colleague, just yesterday, about whether we
should unleash these on the Net... They need to be cleaned up a
little and need some basic documentation on how to run the Perl
scripts.
Andy.
Andrew Houghton, OCLC Online Computer Library Center, Inc.
http://www.oclc.org/about/
http://www.oclc.org/research/staff/houghton.htm
--
Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan
[EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/