Re: RFR: 8365675: Add String Unicode Case-Folding Support [v11]

Xueming Shen Tue, 02 Dec 2025 10:07:13 -0800

On Mon, 1 Dec 2025 23:03:17 GMT, Naoto Sato <[email protected]> wrote:


>> Xueming Shen has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   minor doc formatting update
>
> make/jdk/src/classes/build/tools/generatecharacter/GenerateCaseFolding.java 
> line 79:
> 
>> 77:         // hack, hack, hack! the logic does not pick 0131. just add 
>> manually to support 'I's.
>> 78:         // 0049; T; 0131; # LATIN CAPITAL LETTER I
>> 79:         final String T_0x0131_0x49 = String.format("        
>> entry(0x%04x, 0x%04x),\n", 0x0131, 0x49);
> 
> The 'T' status reads (in CaseFolding.txt):
> 
> # T: special case for uppercase I and dotted uppercase I
> #    - For non-Turkic languages, this mapping is normally not used.
> #    - For Turkic languages (tr, az), this mapping can be used instead of the 
> normal mapping for these characters.
> #      Note that the Turkic mappings do not maintain canonical equivalence 
> without additional processing.
> 
> Since this casefold feature is locale independent, should this `T` status be 
> ignored? It might be helpful if we mention in the spec if we do this `T` case 
> folding.

T_0x0131_0x49 is for the table Expanded_Case_Map_Entries, which is used for the 
regex only.  See: 
https://openjdk.github.io/cr/?repo=jdk&pr=26285&range=05#new-1-make/jdk/src/classes/build/tools/generatecharacter/CaseFolding.java
The casefolding mapping for regex uses CTS,  to match the existing behavior.  
We may want to do something later to "consolidate" the spec and implementation 
,  but it's not within the scope of this change.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27628#discussion_r2582279595

Re: RFR: 8365675: Add String Unicode Case-Folding Support [v11]

Reply via email to