On 1/27/2012 1:16 PM, Matt Ma wrote:
Hi,

There are a few characters having no decomposition type defined in
UnicodeData.txt, but they were assigned tertiary weight in
allkeys.text as if the characters had decomposition type. Here are a
few examples (version 6.0.0),

...

U+A733, U+A732, U+1F1E6  were given tertiary weight as they were
<compat>, while U+31B4 as it were<final>.

Yep, that is all done deliberately, to make the default sorting a bit more consistent.
The normative decompositions in UnicodeData.txt are only the starting point
for attempting to give more consistent default weights for collation.


Is this something documented outside of UCA?

No, because it is only relevant *to* UCA. At least as far as documentation
written by the UTC is concerned.

Well, I suppose it is also relevant to CLDR, because CLDR bases its collation tables on a tailoring of allkeys.txt from UCA. I don't know what documentation
there may or may not be about the default treatment for tertiary weights
in CLDR. Somebody involved in the details of CLDR collation will have
to answer that one.

--Ken



Reply via email to