Re: Rendering Raised FULL STOP between Digits

Asmus Freytag Fri, 22 Mar 2013 09:09:36 -0700

On 3/22/2013 4:08 AM, Philippe Verdy wrote:

2013/3/22 Asmus Freytag <[email protected]>:

If you need to annotate text with the results of semantic analysis as
performed by a human reader, then you either need XML, or some other format
that can express that particular intent.

Absolutely NO. If this encodes semantics, this is part of plain text,

I think we are on a different page here. In some ways the Unicode term"semantics" is very misleading in this context. What Unicode means bythis fancy term is the character's identity - not it's use.

If you use a colon to mark abbreviation (as in Swedish) you are using acolon - the use may be very different from how a colon is usedelsewhere, but it does not create a new character.

Unicode does not encode the semantics of a sentence or word, butprovides a string of characters of known identity that lets a humanreader determine the semantics of that sentence or word as unambiguouslyas if that sentence had been reproduced by analog means - that's, in anutshell, what Unicode attempts to do.

and not part of an upper layer protocol. Notably these characters
should be used to alter de default (ambiguous) character properties of
the characters they modify, and notably to give them the semantics
needed for existing Unicode algorithms (general categories:
punctuation, diacritic; word-breaking properties...)

Character properties define the *default* behavior of a givencharacter. There are many examples, especially in the context ofpunctuation where a character can have different uses. Each use may needa different treatment by readers (or algorithms).

To handle some behaviors, you may need complex processing (naturallanguage processing) that mimics what a human reader can do.

There are a few exceptions where characters are disunified based onproperties - the most principled of these involve properties that can'tbe modified, such as the bidi property. There are about a dozencharacters that look entirely alike (by design and derivation) yet havebeen disunified based on bidi properties - because bidi propertiescannot be overridden.

There are a few other cases, usually where a character can be bothletter and punctuation where such disunifications were made based onoverridable properties. Here the reason was that this distinction hassuch a wide reach (and hat to be applied by many basic algorithms) thatbreaking the principle of single character identity can be justified.

If a problem is sufficiently severe, then you'd possibly havejustification to disunify. If not, then the answer would be outside thescope of character encoding.


adding new variants of existing characters like what was done
specifically for maths is not a stabl long term solution; solutions
similar to variant selectors however are much more meaningful, and
will allow for example to make the distinction between a MIDDLE DOT
punctuation and an ANO TELEIA, and will also allow them to be rendered
differently (even if there's no requirement to do so).

This is absolutely not "pseudo-coding".

"Pseudo coding" refers to making distinctions between characters not ontheir basic encoding, but by means of "attributes" such as the selectorsyou are suggesting.

Re: Rendering Raised FULL STOP between Digits

Reply via email to