Re: RFR: 8363972: Loose matching of dash/minusSign in number parsing [v2]

Justin Lu Thu, 31 Jul 2025 17:15:25 -0700

On Thu, 31 Jul 2025 22:04:56 GMT, Naoto Sato <na...@openjdk.org> wrote:


>> Enabling lenient minus sign matching when parsing numbers. In some locales, 
>> e.g. Finnish, the default minus sign is the Unicode "Minus Sign" (U+2212), 
>> which is not the "Hyphen Minus" (U+002D) that users type in from keyboard. 
>> Thus the parsing of user input numbers may fail. This change utilizes CLDR's 
>> `parseLenient` element for minus signs and loosely matches them with the 
>> hyphen-minus so that user input numbers can parse. As this is a behavioral 
>> change, a corresponding CSR has been drafted.
>
> Naoto Sato has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Address review comments

src/java.base/share/classes/java/text/DecimalFormat.java line 3544:

> 3542:             var t = text.substring(position, Math.min(tlen, position + 
> alen))
> 3543:                 .replaceAll(lmsp, "-");
> 3544:             return t.regionMatches(0, a, 0, alen);

I presume the original solution was regex for all cases, and you made the 
single char case for the fast path. If you still have the perf benchmark handy, 
I wonder how a char by char approach would compare instead of regex. It would 
also simplify the code since it would be combining the slow and fast path. 
Although the perf is variable based on the affix string length vs lms length 
(11).


int i = 0;
for (; position + i < Math.min(tlen, position + alen); i++) {
    int tIndex = lms.indexOf(text.charAt(position + i));
    int aIndex = lms.indexOf(affix.charAt(i));
    // Non LMS. Match direct
    if (tIndex < 0 && aIndex < 0) {
        if (text.charAt(position + i) != affix.charAt(i)) {
            return false;
        }
    } else {
        // By here, at least one LMS. Ensure both LMS.
        if (tIndex < 0 || aIndex < 0) {
            return false;
        }
    }
}
// Return true if entire affix was matched
return i == alen;

test/jdk/java/text/Format/NumberFormat/LenientMinusSignTest.java line 120:

> 118:         public void testLenientPrefix(String sign) throws ParseException 
> {
> 119:             var df = new DecimalFormat(PREFIX, DFS);
> 120:             df.setStrict(false);

No need to set strict as false since `df` is lenient by default. If it was done 
for clarity, I think the method name already adequately indicates it is a 
lenient style test. Applies to CNF test as well.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2246593124
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2246491998

Re: RFR: 8363972: Loose matching of dash/minusSign in number parsing [v2]

Reply via email to