On October 16th Damian Conway wrote: 
 > If the contents are not a number, they are interpreted as an upper-case
 > Unicode character name, or as a lower-case XHTML entity. For example:

One more problem:  not all XHTML entities are lower-case.  For example:

 Ð Þ É Θ

For a complete list, see:

http://www.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_xhtml_character_entities


I was thinking that we could distinguish them because Unicode character
names are always multiple words, but a quick search turned up ANGLE
(U+2220), so that won't work.

We could special-case ETH and THORN (the only all-uppercase entities)
and require translators to recognize them as entities.

We could allow an ampersand to indicate that it's an entity reference:
E<&ETH> and E<&THORN>.  The ampersand would be optional if the entity
name contains lowercase:  either E<&Eacute> or E<Eacute> would be ok.

We could disallow E<ETH> & E<THORN> and require the Unicode names:
E<LATIN CAPITAL LETTER ETH> & E<LATIN CAPITAL LETTER THORN>.

-- 
Chris Madsen                                            [EMAIL PROTECTED]
  ------------------  http://www.pobox.com/~cjm  ------------------

Reply via email to