2014-03-22 1:04 GMT+01:00 Richard Wordingham < [email protected]>:
> On Thu, 20 Mar 2014 05:59:49 +0100 > Philippe Verdy <[email protected]> wrote: > Not all Indic diacritics have combining class 0, and Hebrew diacritics > have non-zero combining classes. > Did I say something else ? You have probably misread me. I have written "distinct and non-zero" You forgot the term "AND" which is important as it gives the condition where combining characters may be reordered during normalization, and so that their relative encoding order is unpreditable (independantly of the fact that they may be precomposed). So if you enter <C, CEDILLA, ACUTE> or <C, ACUTE, CEDILLA>, you get in the editor's backing store some encoding form (which my be precombined or not, or with diacritics not necessarily in the normalized form, and all these 4 possible encodings are canonically equivalent): they if you press Backspace, the effect should also not depend on whever you just entered these keystroke or if you loaded the text and clicked after the sequence before pressing backspace: How can you predict which character to remove ? That why here it should delete BOTH the CEDILLA and the ACUTE, because they are using distinct and non-zero combining classes, and so are unordered. The relationale would be true as well for Hebrew points (most of them use distinct non-zero compbining classes when they are used in sequences). But it won't apply to "diacritics" (combining characters or joiner controls like CGJ, ZWK and ZWNJ, and possibly even some oher format controls) that have combining class 0 because their encoding order is significant to you know where to stop the effect of Backspace. I see absolutely no reason why Backspace would arbitrarily delete only the last encoded character when users canno even count them and may not have input them separately. or could expect them to have be typed in a different order. So yes, entering: <CEDILLA DEADKEY, ACUTE DEADKEY, C, BACKSPACE>, or <ACUTE DEADKEY, CEDILLA DEADKEY, C, BACKSPACE>, or <ACUTE DEADKEY, C WITH CEDILLA, BACKSPACE>, or <CEDILLA DEADKEY, C WITH ACUTE, BACKSPACE> should all result in keeping only the letter C in the backing store. And with a IME supporint Compose key this will also be true; <COMPOSE, C, CEDILLA, ACUTE, BACKSPACE>, or <COMPOSE, C, ACUTE, CEDILLA, BACKSPACE>, or <COMPOSE, C WITH CEDILLA, ACUTE, BACKSPACE>, or <COMPOSE, C WITH ACUTE, CEDILLA, BACKSPACE> Canonical equivalence should be respected in visual editing modes. Deleting only the "last" encoding diacritic should only be done in specific non-visual editing modes (with "visible controls") and it is not expected that most users will like this editing mode.
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

