Andy, Many thanks. I'd be interested in looking at your scripts if you do post them!
-- Chris -----Original Message----- From: Houghton,Andrew [mailto:[EMAIL PROTECTED] Sent: Thursday, July 01, 2004 10:23 AM To: [EMAIL PROTECTED] Subject: RE: Displaying diacritics in a terminal vs. a browser > From: Christopher Morgan [mailto:[EMAIL PROTECTED] > Sent: 01 July, 2004 10:50 > Subject: Displaying diacritics in a terminal vs. a browser > > I use the $cs->to_utf8 conversion from MARC::Charset to display MARC > Authority records in a browser, and the diacritics display properly > there. > But they don't display properly via SDTOUT in my terminal window (I > get two characters instead of one -- one with the letter and one with > the accent mark). Am I doing something wrong? I'm using: > > binmode (STDOUT, ":utf8"); > > Is there any way around this problem, or is it a limitation of > terminal displays? I'm not sure what MARC::Charset does internally, but MARC-8 defines the diacritic separate from the base character. So even using binmode(STDOUT,":utf8") will produce two characters, one for the base character followed by the diacritic. If you want them combined then you need to combine them. It just so happens that I have recently been converting MARC-XML to RDF. The RDF specification mandates Unicode Normal form C, which means that the base character and the diacritic are combined. MARC-XML uses Unicode Normal form D, which means that the base character is separate from the diacritic. So I hacked together some Perl scripts to convert Unicode NFD <-> Unicode NFC. The scripts require Perl 5.8.0. I was talking with a colleague, just yesterday, about whether we should unleash these on the Net... They need to be cleaned up a little and need some basic documentation on how to run the Perl scripts. Andy. Andrew Houghton, OCLC Online Computer Library Center, Inc. http://www.oclc.org/about/ http://www.oclc.org/research/staff/houghton.htm