> How should this be identified then? Just by ranges or something?

Its a while since I came across this, but IIRC there is a separate indic 
setting in the unicode standard that says something about how it combines, 
because the rules of indic languages are complex (see comments above ;-)

> If that's all we're after, I can keep the normalization step and remove the 
> manual (incomplete?) combining character support, which should still do the 
> right thing™ in the vast majority of cases

Well the NFKC[^1] normalization should handle a lot of cases by itself.  The 
extra combining character support adds the case when there is no pre-combined 
code point, so it will handle some more cases, but what proportion of 
additional cases it gets correct I can't say.  So better to be simple even if 
it misses a few cases.

> not counting the fact that it's currently terribly broken yet nobody 
> complained before.

Yes, its hardly worth the effort to complicate a capability that appears to be 
little used, just being safe (ie select proper code points) is enough since it 
can always be manually overridden if the simple answer is wrong.

[^1]: Why did glib use different names?  There is a standard, NFC, NFKC etc so 
why did they invent their own names?  Its a guess what standard name the glib 
names relate to, as usual its not documented!!! [end rant] Thats why my 
suggestion of `G_NORMALIZE_ALL_COMPOSE` was so tentative.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3846#issuecomment-2067650306
You are receiving this because you are subscribed to this thread.

Message ID: <geany/geany/pull/3846/[email protected]>

Reply via email to