Andy,

Many thanks. I'd be interested in looking at your scripts if you do post
them!

-- Chris 

-----Original Message-----
From: Houghton,Andrew [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 01, 2004 10:23 AM
To: [EMAIL PROTECTED]
Subject: RE: Displaying diacritics in a terminal vs. a browser

> From: Christopher Morgan [mailto:[EMAIL PROTECTED]
> Sent: 01 July, 2004 10:50
> Subject: Displaying diacritics in a terminal vs. a browser
> 
> I use the $cs->to_utf8 conversion from MARC::Charset to display MARC 
> Authority records in a browser, and the diacritics display properly 
> there.
> But they don't display properly via SDTOUT in my terminal window (I 
> get two characters instead of one -- one with the letter and one with 
> the accent mark). Am I doing something wrong? I'm using:
>  
>       binmode (STDOUT, ":utf8");
> 
> Is there any way around this problem, or is it a limitation of 
> terminal displays?

I'm not sure what MARC::Charset does internally, but MARC-8 defines the
diacritic separate from the base character.  So even using
binmode(STDOUT,":utf8") will produce two characters, one for the base
character followed by the diacritic.  If you want them combined then you
need to combine them.

It just so happens that I have recently been converting MARC-XML to RDF.
The RDF specification mandates Unicode Normal form C, which means that the
base character and the diacritic are combined.  MARC-XML uses Unicode Normal
form D, which means that the base character is separate from the diacritic.
So I hacked together some Perl scripts to convert Unicode NFD <-> Unicode
NFC.
The scripts require Perl 5.8.0.

I was talking with a colleague, just yesterday, about whether we should
unleash these on the Net...  They need to be cleaned up a little and need
some basic documentation on how to run the Perl scripts.


Andy.

Andrew Houghton, OCLC Online Computer Library Center, Inc.
http://www.oclc.org/about/
http://www.oclc.org/research/staff/houghton.htm


Reply via email to