> MARC-XML uses Unicode Normal form D, which means that the base > character is separate from the diacritic.
I am not familiar with the MARC-XML specifications, so at the risk of embarrasing myself would it be correct to posit that it may not be that MARC-XML uses Unicode Normal form D, so much as the fact that the MARC 21 UCS/Unicode environment is essentially the MARC-8 character repertoire translated into the Unicode equivalent code points [1]. Since the MARC-8 character repertoire relies largely on combining characters, the end result will mostly be Unicode Normal form D. However, there *are* exceptions. One example is UPPERCASE O-HOOK which is a single character in MARC-8 (hex AC), and therefore a precomposed character in MARC UCS/Unicode (hex 01A1) [and therefore I assume MARC-XML], even though there is a decomposed (i.e. Normal Form D) Unicode version (hex 006F 031B) of that character. I have been trying to learn about character sets, especially in regards to MARC and library environments and have put some (hopefully) useful information on the web [2]. Included is a technical primer for librarians as well as extensive code charts/matrices for MARC character sets. There is a fairly decent list of web resources [3]. Note that the powerpoint slide show is of limited use without the original commentary and is a huge file due to including embedded fonts. [1] Coded Character Sets > A Technical Primer for Librarians > MARC Unicode http://rocky.uta.edu/doran/charsets/unicode.html [2] Coded Character Sets http://rocky.uta.edu/doran/charsets/ [3] Resources on the Web: With an emphasis on library automation and the internet http://rocky.uta.edu/doran/charsets/resources.html BTW, the earlier message I sent to the list had an unfinished sentence. I should have proofread before sending and I apologize. -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-239-5368 cell # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/