On Wed, 7 Aug 2019 14:19:26 -0700 Asmus Freytag via Unicode <unicode@unicode.org> wrote:
> What about text that must exist normalized for other purposes? > > Domain names must be normalized to NFC, for example. Will such > strings display correctly if passed to USE? One solution, of course, is to minimise the use of Microsoft products. (The trick is to apply the normalisation algorithm using a permutation of the positive ccc values.) The latest version of HarfBuzz renders subscripted final consonants; it's slowly recovering its pre-USE rendering capabilities. > On 8/7/2019 1:39 PM, Andrew Glass via Unicode wrote: > That's correct, the Microsoft implementation of USE spec does not > normalize as part of the shaping process. Why? Because the ccc system > for non-Latin scripts is not a good mechanism for handling complex > requirements for these writing systems and the effects of ccc-based > normalization can disrupt authors intent. Unfortunately, because we > cannot fix ccc values, shaping engines at Microsoft have ignored > them. Therefore, recommendation for passing text to USE is to not > normalize. HarfBuzz solved the problem of <tone, sakot> by choosing a suitable normalisation; it uses the same technique for Hebrew, where the normalisation classes are also unfriendly to renderers. > By the way, at the current time, I do not have a final consensus from > Tai Tham experts and community on the changes required to support Tai > Tham in USE. Therefore, I've not been able to make the changes > proposed in this thread. Grammatical denazification is one solution. Another one is to delegate matters to the font. Give us a script type that will implement a GSUB feature by default, and font writers can take it from there. At present I have a conundrum on how to render the accusative singular of the cruciform form of the word for enlightenment without usinɡ chained syllables, _bodhiṃ_. The obvious visual encoding is <LOW PA, LOW THA, SIGN E, SIGN I, MAI KANG, SIGN AA>. This combination is very unusual, perhaps unique to this word. (Pali 'o' is <SIGN E, SIGN AA>). However, a very common combination, because the UTC refused Tai Tham the character SIGN AM, is SIGN AA, MAI KANG, so for the USE, SIGN AA and MAI KANG have to be in the same character class. (Alternatively, we split the syllable before SIGN AA.) MAI KANG has InSc=bindu, while SIGN AA is a right matra. Unfortunately, there is a strong temptation for many to write what would have been 'SIGN AM' as MAI KANG, SIGN AA, which is to be rendered quite differently from 'SIGN AM' outside Northern Thailand, e.g. in NE Thailand. (Northern Thailand has both syles; it is quite diverse.) If I understand the principles of USE, allowing both '... MAI KANG, SIGN AA...' and '... SIGN AA, MAI KANG ....', which immediately after a consonant have the same rendering in some fonts and very confusable renderings in many others, is considered highly undesirable. For Microsoft applications, another solution is for fonts to deleted dotted circles between Tai Tham characters. (I try to be more selective, but this results in a complicated set of lookups to ensure that deletion only occurs when the renderer has inserted inappropriate dotted circles.) This is not compliant with Unicode, but neither is deliberately treating canonically equivalent forms differently. Richard.