Jane,
 
Thanks very much for the information about Unicode and MARC-8.  I still have a lot to 
learn about the two formats! Since my MARC data is being manipulated primarily in a 
browser via a cgi script, I'll forego writing a converter for the terminal display for 
now, but I eventually plan to do that. Thanks again!
 
- Chris

  _____  

From: Jacobs, Jane W [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 01, 2004 1:51 PM
To: 'Christopher Morgan'
Subject: RE: Displaying diacritics in a terminal vs. a browser



Hi Chris,

I hope my analysis is correct; I think that two problems are going on here:

1) Your terminal display is very likely not up to the "combining" aspect of combining 
diacriticals.
2) More importantly there is an important technical shift in placement of diacritical 
marks between MARC-8 and Unicode:
 
A MARC-8 sequence places a combining diacritical mark BEFORE the letter it's supposed 
to combine.  Whereas Unicode syntax is to put it AFTER the letter it's supposed to 
combine with.  
 
Hence for example the letter: ZÌ
is produced by the MARC-8 Sequence: 
75 5A (macron below + "Z")
but 
0331 005A  ("Z" + Combining Macron below) in Unicode.
 
I believe if you don't account for this in your UTF-8 transformation, you will get 
either no combining or combining with the wrong character.
 
Hope that's useful.
JJ
 



**Views expressed by the author do not necessarily represent those of the Queens 
Library.**

Jane Jacobs
Asst. Coord., Catalog Division
Queens Borough Public Library
89-11 Merrick Blvd.
Jamaica, NY 11432

tel.: (718) 990-0804
e-mail: [EMAIL PROTECTED]
FAX. (718) 990-8566



-----Original Message-----
From: Christopher Morgan [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 01, 2004 10:50 AM
To: [EMAIL PROTECTED]
Subject: Displaying diacritics in a terminal vs. a browser



Hi all,

I use the $cs->to_utf8 conversion from MARC::Charset to display MARC Authority records 
in a browser, and the diacritics display properly there. But they don't display 
properly via SDTOUT in my terminal window (I get two characters instead of one -- one 
with the letter and one with the accent mark). Am I doing something wrong? I'm using:

        binmode (STDOUT, ":utf8");

Is there any way around this problem, or is it a limitation of terminal displays?

(I found a thread in the archives: http://www.mail-archive.com/[EMAIL 
PROTECTED]/msg00280.html
that discusses a similar issue, but it didn't really answer my question).

Thanks!

-- Chris Morgan




Reply via email to