On Fri, 1 Aug 2025 22:19:22 GMT, Justin Lu <j...@openjdk.org> wrote: >> Naoto Sato has updated the pull request with a new target base due to a >> merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8363972-Loose-matching-dash >> - Spec update >> - Supplementary/CanonEq tests >> - flipped again, which was correct >> - flipped the size check >> - Address review comments >> - Merge branch 'master' into JDK-8363972-Loose-matching-dash >> - tidying up >> - test location >> - spec update >> - ... and 6 more: https://git.openjdk.org/jdk/compare/8e921aee...3682484d > > src/java.base/share/classes/java/text/DecimalFormat.java line 3543: > >> 3541: var lmsp = Pattern.compile("[" + lms + "]", >> Pattern.CANON_EQ); >> 3542: var a = lmsp.matcher(affix).replaceAll("-"); >> 3543: var t = lmsp.matcher(text.substring(position, >> Math.min(tlen, position + alen))) > > I don't think this works. If you see my other comment regarding the test case > and swap it such that `lenientMinusSign` is "-\u00E4" and the pattern (affix) > includes "a\u0308" and we parse some text such as "\u00E41.5". > > Then we have the following, > > > var lmsp = Pattern.compile("[" + lms + "]", Pattern.CANON_EQ); // Returns > pattern [-ä] > var a = lmsp.matcher(affix).replaceAll("-"); // Gets correctly normalized to > "-" > // Incorrectly matches against "ä1" and normalized to "-1" since the > substring returned is indexed from 0 to 2. > var t = lmsp.matcher(text.substring(position, Math.min(tlen, position + > alen))) > > > That is, I think we need to substring after we have performed the > normalization. Something such as, > > > var lmsp = Pattern.compile("[" + lms + "]", Pattern.CANON_EQ); > var a = lmsp.matcher(affix).replaceAll("-"); > var t = lmsp.matcher(text).replaceAll("-"); > return t.startsWith(a, position); > > > However, we will still run into issues later in DecimalFormat, as the > position is incremented by the original prefix length. > > > } else if (gotNegative) { > position += negativePrefix.length(); > > > So for "ä1.5" we would start parsing at position = 2, erroneously returning > "0.5". So further thought may be needed.
The current implementations assume the prefix/suffix lengths in a lot of places (no wonder), should be re-examined. > test/jdk/java/text/Format/NumberFormat/LenientMinusSignTest.java line 128: > >> 126: .set(dfs, "-\u00E5"); >> 127: var df = new DecimalFormat("#.#;a\u0308#.#", dfs); >> 128: assertEquals(df.parse("a\u03081.5"), -1.5); > > This one confuses me, should it not be parsing "\u00E51.5" such that the test > can check if canonical equivalence occurs when matching the text "\u00E5" to > the affix "a\u0308"? Otherwise, we are just parsing the exact pattern we set? > Also I think it should be "\u00E4", not "\u00E5". That's correct. I think the simple case is not that simple. Need to rethink. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2249005477 PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2249005036