> From: Galen Charlton [mailto:galen.charl...@liblime.com]
> Sent: Wednesday, January 07, 2009 11:47 AM
> To: Michael Lackhoff
> Cc: perl4lib@perl.org
> Subject: Re: How to convert from ANSEL/MARC-8 to UTF-8?
> 
> On Wed, Jan 7, 2009 at 11:42 AM, Michael Lackhoff
> <lackh...@fh-muenster.de> wrote:
> > diakritics + base char to the combined character. So I still have two
> > characters for e.g. the
> > German umlauts. This might be correct UTF-8 but is not useable to
> > present in (X)HTML.

I just cannot let that go.  UTF-8 *is* Unicode encoded in a special way.
Whether the characters are combined or uncombined is not relevant to
(X)HTML so long as you specify that the document is encoded in a Unicode
encoding, e.g., UTF-8, UTF-16BE, UTF-16LE, and the user agent, e.g.,
browser understands Unicode which I think is a requirement of the (X)HTML
standards.  Your browser should be able to deal with combined or uncombined
characters however, uncombined characters may not display appropriately due
to font rendering issues, which is why you might be inclined to pre-compose
any uncombined characters in your (X)HTML, e.g., convert them to Unicode 
Normal Form C (NFC).


Andy.

Reply via email to