On 8/7/2019 5:33 PM, Andrew Glass via Unicode wrote:

I agree and understand that accurate representation is important in this case. It would be good to understand how widespread the issue is in order to begin to justify the work to retrofit shaping with normalization. The number of problematic strings may be small but the risk of regression in this case might be quite large.

Not sure how to quantify this. Potentially every URL (assuming that local users eventually migrate to non-ASCII domains). Then again, not all of these will be normalized in the document.

I don't know the precise behavior of address bar / status bar. I know that when you type in an uppercase ASCII domain name, it will resolve, but the lower case name is echoed.

Can't tell immediately whether that means that for names that are normalized for lookup, you also get the canonical name displayed. If so, then every single (local) URL in those scripts is potentially affected.

A./


 

Cheers,

 

Andrew

 

From: Asmus Freytag (c) <asm...@ix.netcom.com>
Sent: 07 August 2019 17:17
To: Andrew Glass <andrew.gl...@microsoft.com>; Unicode Mailing List <unicode@unicode.org>
Subject: Re: What is the time frame for USE shapers to provide support for CV+C ?

 

On 8/7/2019 5:08 PM, Andrew Glass wrote:

Shaping domain names is a new requirement. It would be good to understand the specific cases that are falling in the gap here.

Domain names are simply strings, but the protocol enforces normalization to NFC. In some situations, it might be possible for a browser, for example, to have access to the user-provided string, but I can see any number of situations where the actual string (as stored in the DNS) would need to be displayed.

For the scenario, it does not matter whether it's NFC or NFD, what matters is that some particular un-normalized state would be lost; and therefore it would be bad if the result is that the string can no longer be rendered correctly.

In particular, as the strings in question would be identifiers, where accurate recognition is prime.

A./

 

From: Unicode <unicode-boun...@unicode.org> On Behalf Of Asmus Freytag via Unicode
Sent: 07 August 2019 14:19
To: unicode@unicode.org
Subject: Re: What is the time frame for USE shapers to provide support for CV+C ?

 

What about text that must exist normalized for other purposes?

 

Domain names must be normalized to NFC, for example. Will such strings display correctly if passed to USE?

 

A./

 

On 8/7/2019 1:39 PM, Andrew Glass via Unicode wrote:

That's correct, the Microsoft implementation of USE spec does not normalize as part of the shaping process.
Why? Because the ccc system for non-Latin scripts is not a good mechanism for handling complex requirements for these writing systems and the effects of ccc-based normalization can disrupt authors intent. Unfortunately, because we cannot fix ccc values, shaping engines at Microsoft have ignored them. Therefore, recommendation for passing text to USE is to not normalize.
 
By the way, at the current time, I do not have a final consensus from Tai Tham experts and community on the changes required to support Tai Tham in USE. Therefore, I've not been able to make the changes proposed in this thread.
 
Cheers,
 
Andrew
 
-----Original Message-----
From: Richard Wordingham <richard.wording...@ntlworld.com> 
Sent: 07 August 2019 13:29
To: Richard Wordingham via Unicode <unicode@unicode.org>
Cc: Andrew Glass <andrew.gl...@microsoft.com>
Subject: Re: What is the time frame for USE shapers to provide support for CV+C ?
 
On Tue, 14 May 2019 03:08:04 +0100
Richard Wordingham via Unicode <unicode@unicode.org> wrote:
 
On Tue, 14 May 2019 00:58:07 +0000
Andrew Glass via Unicode <unicode@unicode.org> wrote:
 
Here is the essence of the initial changes needed to support CV+C.
Open to feedback.
 
 
  *   Create new SAKOT class
SAKOT (Sk) based on UISC = Invisible_Stacker
  *   Reduced HALANT class
Now only HALANT (H) based on UISC = Virama
  *   Updated Standard cluster mode
 
[< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB
[VS] (CMAbv)* (CMBlw)*)* [MPre] [MAbv] [MBlw] [MPst] (VPre)*
(VAbv)* (VBlw)* (VPst)* (VMPre)* (VMAbv)* (VMBlw)* (VMPst)* (Sk
B)* (FAbv)* (FBlw)* (FPst)* [FM]
 
This next question does not, I believe, affect HarfBuzz.  Will NFC 
code render as well as unnormalised code?  In the first example above, 
<TONE-2, SAKOT, LOW YA> normalises to <SAKOT, TONE-2, LOW YA>, which 
does not match any portion of the regular _expression_.
 
Could someone answer this question, please?  The USE documentation ("CGJ handling will need to be updated if USE is modified to support
normalization") still implies that the USE does not respect canonical equivalence.
 
Richard.
 
 

 

 



Reply via email to