On Mon, 1 Dec 2025 23:03:17 GMT, Naoto Sato <[email protected]> wrote:
>> Xueming Shen has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> minor doc formatting update
>
> make/jdk/src/classes/build/tools/generatecharacter/GenerateCaseFolding.java
> line 79:
>
>> 77: // hack, hack, hack! the logic does not pick 0131. just add
>> manually to support 'I's.
>> 78: // 0049; T; 0131; # LATIN CAPITAL LETTER I
>> 79: final String T_0x0131_0x49 = String.format("
>> entry(0x%04x, 0x%04x),\n", 0x0131, 0x49);
>
> The 'T' status reads (in CaseFolding.txt):
>
> # T: special case for uppercase I and dotted uppercase I
> # - For non-Turkic languages, this mapping is normally not used.
> # - For Turkic languages (tr, az), this mapping can be used instead of the
> normal mapping for these characters.
> # Note that the Turkic mappings do not maintain canonical equivalence
> without additional processing.
>
> Since this casefold feature is locale independent, should this `T` status be
> ignored? It might be helpful if we mention in the spec if we do this `T` case
> folding.
T_0x0131_0x49 is for the table Expanded_Case_Map_Entries, which is used for the
regex only. See:
https://openjdk.github.io/cr/?repo=jdk&pr=26285&range=05#new-1-make/jdk/src/classes/build/tools/generatecharacter/CaseFolding.java
The casefolding mapping for regex uses CTS, to match the existing behavior.
We may want to do something later to "consolidate" the spec and implementation
, but it's not within the scope of this change.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/27628#discussion_r2582279595