On Thu, Jul 08, 2004 at 01:17:48PM -0400, Houghton,Andrew wrote:
> Unicode specifies four normalization methods, NFC, NFD, NFKC,
> and NFKD. While RDF could have just accepted characters in
> unnormalized form, it decided to mandate that all data content
> be provided in NFC normalization form. T
> MARC-XML uses Unicode Normal form D, which means that the base
> character is separate from the diacritic.
I am not familiar with the MARC-XML specifications, so at the risk of
embarrasing myself would it be correct to posit that it may not be that
MARC-XML uses Unicode Normal form D, so much as
Hi Andy,
> From: Houghton,Andrew [mailto:[EMAIL PROTECTED]
>
> It just so happens that I have recently been converting
> MARC-XML to RDF. The RDF specification mandates Unicode
> Normal form C, which means that the base character and the
> diacritic are combined.
That's rather unfortunate, s
> A MARC-8 sequence places a combining diacritical mark BEFORE the letter
> it's supposed to combine. Whereas Unicode syntax is to put it AFTER the
> letter it's supposed to combine with.
>
> Hence for example the letter: ZÌ
> is produced by the MARC-8 Sequence:
> 75 5A (macron below + "Z")
an to do that. Thanks again!
- Chris
_
From: Jacobs, Jane W [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 01, 2004 1:51 PM
To: 'Christopher Morgan'
Subject: RE: Displaying diacritics in a terminal vs. a browser
Hi Chris,
I hope my analysis is correct; I think that two pro
On Thu, Jul 01, 2004 at 11:22:42AM -0400, Houghton,Andrew wrote:
> I'm not sure what MARC::Charset does internally, but MARC-8
> defines the diacritic separate from the base character. So
> even using binmode(STDOUT,":utf8") will produce two characters,
> one for the base character followed by t
> From: Paul Hoffman [mailto:[EMAIL PROTECTED]
> Sent: 01 July, 2004 11:57
> Subject: Re: Displaying diacritics in a terminal vs. a browser
>
> Unless I'm very much mistaken, Chris's code is outputting
> UTF-8 to the terminal, not MARC-8.
> >> From: Christ
Unless I'm very much mistaken, Chris's code is outputting UTF-8 to
the terminal, not MARC-8.
The key is to find a terminal program that correctly displays UTF-8.
I doubt you'll have any trouble finding one -- for example, there
are at least two for Mac OS X alone (Terminal.app and iTerm).
Depending
Andy,
Many thanks. I'd be interested in looking at your scripts if you do post
them!
-- Chris
-Original Message-
From: Houghton,Andrew [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 01, 2004 10:23 AM
To: [EMAIL PROTECTED]
Subject: RE: Displaying diacritics in a terminal vs. a br
> From: Christopher Morgan [mailto:[EMAIL PROTECTED]
> Sent: 01 July, 2004 10:50
> Subject: Displaying diacritics in a terminal vs. a browser
>
> I use the $cs->to_utf8 conversion from MARC::Charset to
> display MARC Authority records in a browser, and the
> diacritics display properly there.
>
10 matches
Mail list logo