On October 16th Damian Conway wrote: > If the contents are not a number, they are interpreted as an upper-case > Unicode character name, or as a lower-case XHTML entity. For example:
One more problem: not all XHTML entities are lower-case. For example: Ð Þ É Θ For a complete list, see: http://www.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_xhtml_character_entities I was thinking that we could distinguish them because Unicode character names are always multiple words, but a quick search turned up ANGLE (U+2220), so that won't work. We could special-case ETH and THORN (the only all-uppercase entities) and require translators to recognize them as entities. We could allow an ampersand to indicate that it's an entity reference: E<Ð> and E<Þ>. The ampersand would be optional if the entity name contains lowercase: either E<É> or E<Eacute> would be ok. We could disallow E<ETH> & E<THORN> and require the Unicode names: E<LATIN CAPITAL LETTER ETH> & E<LATIN CAPITAL LETTER THORN>. -- Chris Madsen [EMAIL PROTECTED] ------------------ http://www.pobox.com/~cjm ------------------