> From: Galen Charlton [mailto:galen.charl...@liblime.com] > Sent: Wednesday, January 07, 2009 11:47 AM > To: Michael Lackhoff > Cc: perl4lib@perl.org > Subject: Re: How to convert from ANSEL/MARC-8 to UTF-8? > > On Wed, Jan 7, 2009 at 11:42 AM, Michael Lackhoff > <lackh...@fh-muenster.de> wrote: > > diakritics + base char to the combined character. So I still have two > > characters for e.g. the > > German umlauts. This might be correct UTF-8 but is not useable to > > present in (X)HTML.
I just cannot let that go. UTF-8 *is* Unicode encoded in a special way. Whether the characters are combined or uncombined is not relevant to (X)HTML so long as you specify that the document is encoded in a Unicode encoding, e.g., UTF-8, UTF-16BE, UTF-16LE, and the user agent, e.g., browser understands Unicode which I think is a requirement of the (X)HTML standards. Your browser should be able to deal with combined or uncombined characters however, uncombined characters may not display appropriately due to font rendering issues, which is why you might be inclined to pre-compose any uncombined characters in your (X)HTML, e.g., convert them to Unicode Normal Form C (NFC). Andy.