I was looking the feedback in http://www.unicode.org/review/pri355/, and didn't see yours there. Could you please file your feedback there? (Nothing on this list is tracked by the committee...)
FYI, I'm thinking now that the change should be: GB9c: (Virama | ZWJ ) × LinkingConsonant => GB9c: (Virama ViramaExtend* | ZWJ ) × LinkingConsonant where ViramaExtend = [Extend - Virama - \p{ccc=0}] (This is pre-partitioning.) That is close to your formulation, but for for canonical equivalence, there shouldn't need to allow the ViramaExtend after ZWJ, because the ZWJ has ccc=0, and thus nothing reorders around it. Cibu also pointed out on a different thread that for Malayalam we need to consider a couple of other forms: ... Following contexts should be allowed for requesting reformed or traditional conjuncts as per Unicode10.0.0/ch12 page 505. ... /$L ZWNJ $V $L/ /$L ZWJ $V $L/ The ZWJ Virama sequence is already provided for by the combination of GB9 & GB9c. But not the ZWNJ. If we want to handle that, it would mean the addition of something like: GB9d: × (ZWNJ ViramaExtend* Virama) Cibu also wrote: Also, when we disallow /$L $V ZWJ $D/, it is disallowing the sequences involving legacy chillus. That is, for example, <CHILLU N, VOWEL SIGN E> is a valid sequence (Examples in Unicode10.0.0/ch12 Table 12.36). It's legacy equivalent would be <NA, VIRAMA, ZWJ, VOWEL SIGN E>. It might be OK to disallow this; but, we should be mindful of this side effect. To account for the legacy cases, the simplest approach might be to add some characters to GCB= LinkingConsonant Note: The final date for deciding exactly what to do with #29 will be in April, so there is some more time to discuss this. But we have to have a pretty solid proposal going into that April meeting. The only test files that we have gotten from India so far include Devanagari, Malayalam and Bengali. I suspect that the UTC is likely to be conservative, and limit the GCB=Virama category to just those scripts that we have test files for , and that look complete. Mark On Mon, Dec 11, 2017 at 2:16 AM, Richard Wordingham via Unicode < unicode@unicode.org> wrote: > On Sun, 10 Dec 2017 21:14:18 -0800 > Manish Goregaokar via Unicode <unicode@unicode.org> wrote: > > > > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant > > > > You can also explicitly request ligatureification with a ZWJ, so > > perhaps this rule should be something like > > > > (Virama ZWJ? | ZWJ) x Extend* LinkingConsonant > > > > -Manish > > > > On Sat, Dec 9, 2017 at 7:16 AM, Mark Davis ☕️ via Unicode < > > unicode@unicode.org> wrote: > > > > > 1. You make a good point about the GB9c. It should probably instead > > > be something like: > > > > > > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant > > This change is unnecessary. If we start from Draft 1 where there are: > > GB9: × (Extend | ZWJ | Virama) > GB9c: (Virama | ZWJ ) × LinkingConsonant > > If the classes used in the rules are to be disjoint, we then have to > split Extend into something like ViramaExtend and OtherExtend to allow > normalised (NFC/NFD) text, at which point we may as well continue to > have rules that work without any normalisation. Informally, > > ViramaExtend = Extend and ccc ≠ 0. > > OtherExtend = Extend and ccc = 0. > > (We might need to put additional characters in ViramaExtend.) > > This gives us rules: > > GB9': × (OtherExtend | ViramaExtend | ZWJ | Virama) > > GB9c': (Virama | ZWJ ) ViramaExtend* × LinkingConsonant > > So, for a sequence <virama, ZWJ, nukta, LinkingConsonant>, GB9' gives us > > virama × ZWJ × nukta LinkingConsonant > > and GB9c' gives us > > virama × ZWJ × nukta × LinkingConsonant > > --- > In Rule GB9c, what examples justify including ZWJ? Are they just the C1 > half-forms? My knowledge suggests that > > GB9c'': Virama (ZWJ | ViramaExtend)* × LinkingConsonant > > might be more appropriate. > > Richard. > >