> MARC-XML uses Unicode Normal form D, which means that the base
> character is separate from the diacritic.

I am not familiar with the MARC-XML specifications, so at the risk of
embarrasing myself would it be correct to posit that it may not be that
MARC-XML uses Unicode Normal form D, so much as the fact that the MARC 21
UCS/Unicode environment is essentially the MARC-8 character repertoire
translated into the Unicode equivalent code points [1].  Since the MARC-8
character repertoire relies largely on combining characters, the end result
will mostly be Unicode Normal form D.  However, there *are* exceptions.  One
example is UPPERCASE O-HOOK which is a single character in MARC-8 (hex AC),
and therefore a precomposed character in MARC UCS/Unicode (hex 01A1) [and
therefore I assume MARC-XML], even though there is a decomposed (i.e. Normal
Form D) Unicode version (hex 006F 031B) of that character.

I have been trying to learn about character sets, especially in regards to
MARC and library environments and have put some (hopefully) useful
information on the web [2].  Included is a technical primer for librarians
as well as extensive code charts/matrices for MARC character sets.  There is
a fairly decent list of web resources [3].  Note that the powerpoint slide
show is of limited use without the original commentary and is a huge file
due to including embedded fonts.

[1] Coded Character Sets > A Technical Primer for Librarians > MARC Unicode
    http://rocky.uta.edu/doran/charsets/unicode.html

[2] Coded Character Sets
    http://rocky.uta.edu/doran/charsets/

[3] Resources on the Web: With an emphasis on library automation and the
internet
    http://rocky.uta.edu/doran/charsets/resources.html
    
BTW, the earlier message I sent to the list had an unfinished sentence.  I
should have proofread before sending and I apologize.

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-239-5368 cell
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/ 


Reply via email to