Re: RFR: 8361076: Add benchmark for ImageReader in preparation for Valhalla changes [v3]
On Tue, 1 Jul 2025 16:50:09 GMT, David Beaumont wrote: >> test/micro/org/openjdk/bench/jdk/internal/jrtfs/ImageReaderBenchmark.java >> line 230: >> >>> 228: // Created by running "java -verbose:class", throwing away >>> anonymous inner >>> 229: // classes and anything without a reliable name, and grouping by >>> the stated >>> 230: // source. It's not perfect, but it's representative. >> >> I don't think this is maintainable. How useful (or not) is this benchmark if >> the names of all the internal classes (that will change from release to >> release) are dropped from this? > > Debatable. It's obviously going to scale any results somewhat based on the > size of the resources and number of classes. It's kind nice to see "this > change removes at least N micro/milli seconds of time spent" since that's a > minimum set of classes we expect to always be needed, so any time saved is a > lower bound. > > I'd say maybe leave it as is for now with a note saying "if this keeps > breaking, make the list less fragile". > I'm also assuming this is only run manually, and not a part of any CI > pipeline ... so please let me know if I'm wrong about that. lib/classlist is used to generate the AOT archive, I wonder if a `@Setup` could be used to consume that to create the INIT_CLASSES. - PR Review Comment: https://git.openjdk.org/jdk/pull/26044#discussion_r2209331471
RFR: 8362244: Devkit's Oracle Linux base OS keyword is incorrectly documented
`open/doc/building.md` uses the incorrect `OEL6` keyword to denote the Oracle Enterprise Linux base OS distribution for Devkit – replaced it with the correct one instead, i.e., `OL`. - Commit messages: - Fix OL-related base OS keywords in Devkit docs Changes: https://git.openjdk.org/jdk/pull/26311/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26311&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8362244 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26311.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26311/head:pull/26311 PR: https://git.openjdk.org/jdk/pull/26311
Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v5]
> Regex class should conform to **_Level 1_** of [Unicode Technical Standard > #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), plus > RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters. > > This PR primarily addresses conformance with RL1.5: Simple Loose Matches, > which requires that simple case folding be applied to literals and > (optionally) to character classes. When applied to character classes, each > class is expected to be closed under simple case folding. See the standard > for a detailed explanation of what it means for a class to be “**_closed_**.” > > **RL1.5 states**: > > To meet this requirement, an implementation that supports case-sensitive > matching should > > 1. Provide at least the simple, default Unicode case-insensitive > matching, and > 2. Specify which character properties or constructs are closed under the > matching. > > **In the Pattern implementation**, 5 types of constructs may be affected by > case sensitivity: > > 1. back-refs > 2. string slices (sequences) > 3. single character, > 4. character families (Unicode Properties ...), and > 5. character class ranges > > **Note**: Single characters and families may appear independently or within a > character class. > > For case-insensitive (loose) matching, the implementation already applies > Character.toUpperCase() and Character.toLowerCase() to **both the pattern and > the input string** for back-refs, slices, and single characters. This > effectively makes these constructs closed under case folding. > > This has been verified in the newly added test case > **_test/jdk/java/util/regex/CaseFoldingTest.java_**. > > For example: > > Pattern.compile("(?ui)\u017f").matcher("S").matches(). => true > Pattern.compile("(?ui)[\u017f]").matcher("S").matches()=> true > > The character properties (families) are not "closed" and should remain > unchanged. This is acceptable per RL1.5, if the behavior is clearly > specified (TBD: update javadoc to reflect this). > > **Current Non-Conformance: Character Class Ranges**, as reported in the > original bug report. > > Pattern.compile("(?ui)[\u017f-\u017f]").matcher("S").matches() => false > vs > Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true > > vs Perl. (Perl also claims to support the Unicode's loose match with it it's > "i" modifier) > > perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/ ? "true\n" : "false\n"'. => > false > perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/**_i_** ? "true\n" : > "false\n"'. => **_true_** > > The root issue is that the ran... Xueming Shen has updated the pull request incrementally with one additional commit since the last revision: improve the lookup logic and test case for +00df - Changes: - all: https://git.openjdk.org/jdk/pull/26285/files - new: https://git.openjdk.org/jdk/pull/26285/files/c2afc42c..b85f581f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26285&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26285&range=03-04 Stats: 44 lines in 3 files changed: 31 ins; 3 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/26285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26285/head:pull/26285 PR: https://git.openjdk.org/jdk/pull/26285
Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v5]
On Mon, 14 Jul 2025 07:28:09 GMT, Xueming Shen wrote: >> src/java.base/share/classes/jdk/internal/util/regex/CaseFolding.java.template >> line 99: >> >>> 97: */ >>> 98: public static int[] getClassRangeClosingCharacters(int start, int >>> end) { >>> 99: int[] expanded = new int[expanded_casefolding.size()]; >> >> Can be `Math.min(expanded_casefolding.size(), end - start)` in case the >> table grows large, and update the `off < expanded.length` check below too. > > The table itself probably isn't going to grow significantly anytime soon, and > we’ll likely have enough time to adjust if CaseFolding.txt does get > substantially bigger. > > That said, I probably should consider reversing the lookup logic: instead of > iterating through [start, end], we could iterate over the expansion table and > check whether any of its code points fall within the input range, at least > when the range size is larger than the size of the table, kinda O(n) vs > O(1)-ish. updated the lookup logic as discussed. - PR Review Comment: https://git.openjdk.org/jdk/pull/26285#discussion_r2207809731
Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v6]
On Tue, 15 Jul 2025 17:44:00 GMT, Xueming Shen wrote: >> Regex class should conform to **_Level 1_** of [Unicode Technical Standard >> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), >> plus RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters. >> >> This PR primarily addresses conformance with RL1.5: Simple Loose Matches, >> which requires that simple case folding be applied to literals and >> (optionally) to character classes. When applied to character classes, each >> class is expected to be closed under simple case folding. See the standard >> for a detailed explanation of what it means for a class to be “**_closed_**.” >> >> **RL1.5 states**: >> >> To meet this requirement, an implementation that supports case-sensitive >> matching should >> >> 1. Provide at least the simple, default Unicode case-insensitive >> matching, and >> 2. Specify which character properties or constructs are closed under the >> matching. >> >> **In the Pattern implementation**, 5 types of constructs may be affected by >> case sensitivity: >> >> 1. back-refs >> 2. string slices (sequences) >> 3. single character, >> 4. character families (Unicode Properties ...), and >> 5. character class ranges >> >> **Note**: Single characters and families may appear independently or within >> a character class. >> >> For case-insensitive (loose) matching, the implementation already applies >> Character.toUpperCase() and Character.toLowerCase() to **both the pattern >> and the input string** for back-refs, slices, and single characters. This >> effectively makes these constructs closed under case folding. >> >> This has been verified in the newly added test case >> **_test/jdk/java/util/regex/CaseFoldingTest.java_**. >> >> For example: >> >> Pattern.compile("(?ui)\u017f").matcher("S").matches(). => true >> Pattern.compile("(?ui)[\u017f]").matcher("S").matches()=> true >> >> The character properties (families) are not "closed" and should remain >> unchanged. This is acceptable per RL1.5, if the behavior is clearly >> specified (TBD: update javadoc to reflect this). >> >> **Current Non-Conformance: Character Class Ranges**, as reported in the >> original bug report. >> >> Pattern.compile("(?ui)[\u017f-\u017f]").matcher("S").matches() => false >> vs >> Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true >> >> vs Perl. (Perl also claims to support the Unicode's loose match with it it's >> "i" modifier) >> >> perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/ ? "true\n" : "false\n"'. => >> false >> perl -C -e 'print "S" =~ /[\x{017f}-\x{0... > > Xueming Shen has updated the pull request incrementally with one additional > commit since the last revision: > > update to fix the typo Marked as reviewed by naoto (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/26285#pullrequestreview-3021537588
Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v6]
> Regex class should conform to **_Level 1_** of [Unicode Technical Standard > #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), plus > RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters. > > This PR primarily addresses conformance with RL1.5: Simple Loose Matches, > which requires that simple case folding be applied to literals and > (optionally) to character classes. When applied to character classes, each > class is expected to be closed under simple case folding. See the standard > for a detailed explanation of what it means for a class to be “**_closed_**.” > > **RL1.5 states**: > > To meet this requirement, an implementation that supports case-sensitive > matching should > > 1. Provide at least the simple, default Unicode case-insensitive > matching, and > 2. Specify which character properties or constructs are closed under the > matching. > > **In the Pattern implementation**, 5 types of constructs may be affected by > case sensitivity: > > 1. back-refs > 2. string slices (sequences) > 3. single character, > 4. character families (Unicode Properties ...), and > 5. character class ranges > > **Note**: Single characters and families may appear independently or within a > character class. > > For case-insensitive (loose) matching, the implementation already applies > Character.toUpperCase() and Character.toLowerCase() to **both the pattern and > the input string** for back-refs, slices, and single characters. This > effectively makes these constructs closed under case folding. > > This has been verified in the newly added test case > **_test/jdk/java/util/regex/CaseFoldingTest.java_**. > > For example: > > Pattern.compile("(?ui)\u017f").matcher("S").matches(). => true > Pattern.compile("(?ui)[\u017f]").matcher("S").matches()=> true > > The character properties (families) are not "closed" and should remain > unchanged. This is acceptable per RL1.5, if the behavior is clearly > specified (TBD: update javadoc to reflect this). > > **Current Non-Conformance: Character Class Ranges**, as reported in the > original bug report. > > Pattern.compile("(?ui)[\u017f-\u017f]").matcher("S").matches() => false > vs > Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true > > vs Perl. (Perl also claims to support the Unicode's loose match with it it's > "i" modifier) > > perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/ ? "true\n" : "false\n"'. => > false > perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/**_i_** ? "true\n" : > "false\n"'. => **_true_** > > The root issue is that the ran... Xueming Shen has updated the pull request incrementally with one additional commit since the last revision: update to fix the typo - Changes: - all: https://git.openjdk.org/jdk/pull/26285/files - new: https://git.openjdk.org/jdk/pull/26285/files/b85f581f..a090888f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26285&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26285&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26285/head:pull/26285 PR: https://git.openjdk.org/jdk/pull/26285
Re: RFR: 8361076: Add benchmark for ImageReader in preparation for Valhalla changes [v4]
On Tue, 1 Jul 2025 17:02:27 GMT, David Beaumont wrote: >> Initial benchmark to capture at least some comparative measures of >> ImageReader performance. >> >> Current results on my laptop: >> >> Benchmark Mode Cnt ScoreError >> Units >> NewImageBenchmark.warmCache_CountAllNodes avgt5 0.785 ± 0.140 >> ms/op >> NewImageBenchmark.coldStart_CountOnly ss5 34.051 ± 17.662 >> ms/op >> NewImageBenchmark.coldStart_InitAndCountss5 31.006 ± 9.674 >> ms/op >> NewImageBenchmark.coldStart_LoadJavacInitClassesss5 3.752 ± 6.873 >> ms/op >> >> The upcoming changes to ImageReader should show significant improvements to >> these results. > > David Beaumont has updated the pull request incrementally with one additional > commit since the last revision: > > Adding comment about maintainability. Oddly while the status at the bottom of the conversation says it has "2 approvals", the tickbox: Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author is not checked. I can try to integrate, but it might not work. - PR Comment: https://git.openjdk.org/jdk/pull/26044#issuecomment-3075753460
Re: RFR: 8362244: Devkit's Oracle Linux base OS keyword is incorrectly documented
On Tue, 15 Jul 2025 09:58:02 GMT, Volkan Yazici wrote: > `open/doc/building.md` uses the incorrect `OEL6` keyword to denote the Oracle > Enterprise Linux base OS distribution for Devkit – replaced it with the > correct one instead, i.e., `OL`. You also need to regenerate the corresponding building.html using the `update-build-docs` make target doc/building.md line 1289: > 1287: > 1288: `BASE_OS` must be one of `OL` for Oracle Enterprise Linux or `Fedora` > (if > 1289: not specified `OL` will be the default). If the base OS is `Fedora` the Ever so slightly unrelated to your change, but perhaps worth fixing at the same time, even if it means widening the scope of the issue a bit: There is no default, so suggest dropping the parenthesis part. doc/building.md line 1291: > 1289: not specified `OL` will be the default). If the base OS is `Fedora` the > 1290: corresponding Fedora release can be specified with the help of the > 1291: `BASE_OS_VERSION` option (with `27` as default version). If the build is Also unrelated but maybe worth fixing: the default Fedora version is 41 nowadays. - PR Review: https://git.openjdk.org/jdk/pull/26311#pullrequestreview-3021243737 PR Review Comment: https://git.openjdk.org/jdk/pull/26311#discussion_r2208015960 PR Review Comment: https://git.openjdk.org/jdk/pull/26311#discussion_r2208016509
Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v5]
On Tue, 15 Jul 2025 15:11:07 GMT, Xueming Shen wrote: >> Regex class should conform to **_Level 1_** of [Unicode Technical Standard >> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), >> plus RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters. >> >> This PR primarily addresses conformance with RL1.5: Simple Loose Matches, >> which requires that simple case folding be applied to literals and >> (optionally) to character classes. When applied to character classes, each >> class is expected to be closed under simple case folding. See the standard >> for a detailed explanation of what it means for a class to be “**_closed_**.” >> >> **RL1.5 states**: >> >> To meet this requirement, an implementation that supports case-sensitive >> matching should >> >> 1. Provide at least the simple, default Unicode case-insensitive >> matching, and >> 2. Specify which character properties or constructs are closed under the >> matching. >> >> **In the Pattern implementation**, 5 types of constructs may be affected by >> case sensitivity: >> >> 1. back-refs >> 2. string slices (sequences) >> 3. single character, >> 4. character families (Unicode Properties ...), and >> 5. character class ranges >> >> **Note**: Single characters and families may appear independently or within >> a character class. >> >> For case-insensitive (loose) matching, the implementation already applies >> Character.toUpperCase() and Character.toLowerCase() to **both the pattern >> and the input string** for back-refs, slices, and single characters. This >> effectively makes these constructs closed under case folding. >> >> This has been verified in the newly added test case >> **_test/jdk/java/util/regex/CaseFoldingTest.java_**. >> >> For example: >> >> Pattern.compile("(?ui)\u017f").matcher("S").matches(). => true >> Pattern.compile("(?ui)[\u017f]").matcher("S").matches()=> true >> >> The character properties (families) are not "closed" and should remain >> unchanged. This is acceptable per RL1.5, if the behavior is clearly >> specified (TBD: update javadoc to reflect this). >> >> **Current Non-Conformance: Character Class Ranges**, as reported in the >> original bug report. >> >> Pattern.compile("(?ui)[\u017f-\u017f]").matcher("S").matches() => false >> vs >> Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true >> >> vs Perl. (Perl also claims to support the Unicode's loose match with it it's >> "i" modifier) >> >> perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/ ? "true\n" : "false\n"'. => >> false >> perl -C -e 'print "S" =~ /[\x{017f}-\x{0... > > Xueming Shen has updated the pull request incrementally with one additional > commit since the last revision: > > improve the lookup logic and test case for +00df Thanks for the reviews! - PR Comment: https://git.openjdk.org/jdk/pull/26285#issuecomment-3074413884
Re: RFR: 8362244: Devkit's Oracle Linux base OS keyword is incorrectly documented [v2]
On Tue, 15 Jul 2025 16:46:24 GMT, Mikael Vidstedt wrote: >> Volkan Yazici has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Apply review feedback > > doc/building.md line 1289: > >> 1287: >> 1288: `BASE_OS` must be one of `OL` for Oracle Enterprise Linux or `Fedora` >> (if >> 1289: not specified `OL` will be the default). If the base OS is `Fedora` the > > Ever so slightly unrelated to your change, but perhaps worth fixing at the > same time, even if it means widening the scope of the issue a bit: > > There is no default, so suggest dropping the parenthesis part. Removed in 0d85899ae1a. - PR Review Comment: https://git.openjdk.org/jdk/pull/26311#discussion_r2208283263
Re: RFR: 8362244: Devkit's Oracle Linux base OS keyword is incorrectly documented [v2]
> `open/doc/building.md` uses the incorrect `OEL6` keyword to denote the Oracle > Enterprise Linux base OS distribution for Devkit – replaced it with the > correct one instead, i.e., `OL`. Volkan Yazici has updated the pull request incrementally with one additional commit since the last revision: Apply review feedback - Changes: - all: https://git.openjdk.org/jdk/pull/26311/files - new: https://git.openjdk.org/jdk/pull/26311/files/d267739d..0d85899a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26311&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26311&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26311.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26311/head:pull/26311 PR: https://git.openjdk.org/jdk/pull/26311
Integrated: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char
On Mon, 14 Jul 2025 04:53:13 GMT, Xueming Shen wrote: > Regex class should conform to **_Level 1_** of [Unicode Technical Standard > #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), plus > RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters. > > This PR primarily addresses conformance with RL1.5: Simple Loose Matches, > which requires that simple case folding be applied to literals and > (optionally) to character classes. When applied to character classes, each > class is expected to be closed under simple case folding. See the standard > for a detailed explanation of what it means for a class to be “**_closed_**.” > > **RL1.5 states**: > > To meet this requirement, an implementation that supports case-sensitive > matching should > > 1. Provide at least the simple, default Unicode case-insensitive > matching, and > 2. Specify which character properties or constructs are closed under the > matching. > > **In the Pattern implementation**, 5 types of constructs may be affected by > case sensitivity: > > 1. back-refs > 2. string slices (sequences) > 3. single character, > 4. character families (Unicode Properties ...), and > 5. character class ranges > > **Note**: Single characters and families may appear independently or within a > character class. > > For case-insensitive (loose) matching, the implementation already applies > Character.toUpperCase() and Character.toLowerCase() to **both the pattern and > the input string** for back-refs, slices, and single characters. This > effectively makes these constructs closed under case folding. > > This has been verified in the newly added test case > **_test/jdk/java/util/regex/CaseFoldingTest.java_**. > > For example: > > Pattern.compile("(?ui)\u017f").matcher("S").matches(). => true > Pattern.compile("(?ui)[\u017f]").matcher("S").matches()=> true > > The character properties (families) are not "closed" and should remain > unchanged. This is acceptable per RL1.5, if the behavior is clearly > specified (TBD: update javadoc to reflect this). > > **Current Non-Conformance: Character Class Ranges**, as reported in the > original bug report. > > Pattern.compile("(?ui)[\u017f-\u017f]").matcher("S").matches() => false > vs > Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true > > vs Perl. (Perl also claims to support the Unicode's loose match with it it's > "i" modifier) > > perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/ ? "true\n" : "false\n"'. => > false > perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/**_i_** ? "true\n" : > "false\n"'. => **_true_** > > The root issue is that the ran... This pull request has now been integrated. Changeset: 401af27b Author:Xueming Shen URL: https://git.openjdk.org/jdk/commit/401af27b9dbc701eb48e5bc685d3ad058e0de3bc Stats: 2084 lines in 9 files changed: 2079 ins; 0 del; 5 mod 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char Reviewed-by: naoto - PR: https://git.openjdk.org/jdk/pull/26285
Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v6]
On Tue, 15 Jul 2025 17:47:29 GMT, Xueming Shen wrote: >> Regex class should conform to **_Level 1_** of [Unicode Technical Standard >> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), >> plus RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters. >> >> This PR primarily addresses conformance with RL1.5: Simple Loose Matches, >> which requires that simple case folding be applied to literals and >> (optionally) to character classes. When applied to character classes, each >> class is expected to be closed under simple case folding. See the standard >> for a detailed explanation of what it means for a class to be “**_closed_**.” >> >> **RL1.5 states**: >> >> To meet this requirement, an implementation that supports case-sensitive >> matching should >> >> 1. Provide at least the simple, default Unicode case-insensitive >> matching, and >> 2. Specify which character properties or constructs are closed under the >> matching. >> >> **In the Pattern implementation**, 5 types of constructs may be affected by >> case sensitivity: >> >> 1. back-refs >> 2. string slices (sequences) >> 3. single character, >> 4. character families (Unicode Properties ...), and >> 5. character class ranges >> >> **Note**: Single characters and families may appear independently or within >> a character class. >> >> For case-insensitive (loose) matching, the implementation already applies >> Character.toUpperCase() and Character.toLowerCase() to **both the pattern >> and the input string** for back-refs, slices, and single characters. This >> effectively makes these constructs closed under case folding. >> >> This has been verified in the newly added test case >> **_test/jdk/java/util/regex/CaseFoldingTest.java_**. >> >> For example: >> >> Pattern.compile("(?ui)\u017f").matcher("S").matches(). => true >> Pattern.compile("(?ui)[\u017f]").matcher("S").matches()=> true >> >> The character properties (families) are not "closed" and should remain >> unchanged. This is acceptable per RL1.5, if the behavior is clearly >> specified (TBD: update javadoc to reflect this). >> >> **Current Non-Conformance: Character Class Ranges**, as reported in the >> original bug report. >> >> Pattern.compile("(?ui)[\u017f-\u017f]").matcher("S").matches() => false >> vs >> Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true >> >> vs Perl. (Perl also claims to support the Unicode's loose match with it it's >> "i" modifier) >> >> perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/ ? "true\n" : "false\n"'. => >> false >> perl -C -e 'print "S" =~ /[\x{017f}-\x{0... > > Xueming Shen has updated the pull request incrementally with one additional > commit since the last revision: > > update to fix the typo Thanks for the reviews! - PR Comment: https://git.openjdk.org/jdk/pull/26285#issuecomment-3074710596
Re: RFR: 8362244: Devkit's Oracle Linux base OS keyword is incorrectly documented [v2]
On Tue, 15 Jul 2025 16:46:44 GMT, Mikael Vidstedt wrote: >> Volkan Yazici has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Apply review feedback > > doc/building.md line 1291: > >> 1289: not specified `OL` will be the default). If the base OS is `Fedora` the >> 1290: corresponding Fedora release can be specified with the help of the >> 1291: `BASE_OS_VERSION` option (with `27` as default version). If the build >> is > > Also unrelated but maybe worth fixing: the default Fedora version is 41 > nowadays. In 0d85899ae1a, I've removed the entire `(with ... as default version)` snippet – I doubt if it is worth the effort of trying to keep up with that variable, i.e., the Fedora version. We neither share it for OL. Please let me know if you agree or disagree. - PR Review Comment: https://git.openjdk.org/jdk/pull/26311#discussion_r2208293042
RE: malloc/calloc return value NULL check
Regarding calloc – checking , jdk.incubator.vector/unix/native/libsleef seems to have quite a lot of callocs without a NULL check afterwards , but afaik it is 3rd party coding so we will probably not touch it. For sunFont.c I see no direct handling after calloc in the C code : java.desktop/share/native/libfontmanager/sunFont.c-67- */ java.desktop/share/native/libfontmanager/sunFont.c-68-JNIEXPORT jlong JNICALL Java_sun_font_NullFontScaler_getGlyphImage java.desktop/share/native/libfontmanager/sunFont.c-69- (JNIEnv *env, jobject scaler, jlong pContext, jint glyphCode) { java.desktop/share/native/libfontmanager/sunFont.c:70:void *nullscaler = calloc(1, sizeof(GlyphInfo)); java.desktop/share/native/libfontmanager/sunFont.c-71-return ptr_to_jlong(nullscaler); java.desktop/share/native/libfontmanager/sunFont.c-72-} java.desktop/share/native/libfontmanager/sunFont.c-73- -- java.desktop/share/native/libfontmanager/sunFont.c-303-JNIEXPORT jlong JNICALL java.desktop/share/native/libfontmanager/sunFont.c-304-Java_sun_font_StrikeCache_getInvisibleGlyphPtr(JNIEnv *env, jclass cls) { java.desktop/share/native/libfontmanager/sunFont.c-305- java.desktop/share/native/libfontmanager/sunFont.c:306:GlyphInfo *info = (GlyphInfo*) calloc(1, sizeof(GlyphInfo)); java.desktop/share/native/libfontmanager/sunFont.c-307-return (jlong)(uintptr_t)info; /* invisible glyph */ java.desktop/share/native/libfontmanager/sunFont.c-308-} (but in the Java coding calling this, we seem to have some special checks for 0, so maybe it is fine) Here java.base/windows/native/libjli/java_md.c:951:appArgIdx = calloc(argc, sizeof(int)); java.base/windows/native/libjli/java_md.c-952-for (i = idx, j = 0; i < stdargc; i++) { java.base/windows/native/libjli/java_md.c-953-if (isTool) { // filter -J used by tools to pass JVM options java.base/windows/native/libjli/java_md.c-954-arg = stdargs[i].arg; we seem to miss a check, should I open a JBS issue ? Best regards, Matthias From: Thomas Stüfe Sent: Friday, 11 July 2025 18:19 To: Baesken, Matthias Cc: build-dev@openjdk.org Subject: Re: malloc/calloc return value NULL check Absolutely, yes. The larger the allocated size, the more important. Linux kernel, by default, only protects a small area against NULL accesses; depending on distro, 4KB or 64 (?) KB. And the JVM, at various places, allocates in low-area ranges. So accessing NULL+ can actually land you at a valid unrelated address instead of faulting. /Thomas On Fri, Jul 11, 2025 at 2:57 PM Baesken, Matthias mailto:matthias.baes...@sap.com>> wrote: Hi, when playing around with the GCC static analyzer ( https://developers.redhat.com/articles/2022/04/12/state-static-analysis-gcc-12-compiler ) I noticed a lot of complaints about missing NULL checks of malloc/calloc return values in the code base. While we check these return values for NULL at a lot of places in the codebase, it is not done always. Should we do it always (except 3rd party code probably where we do not want to have large diffs to upstream) ? Or is it considered not important enough to do it always? Best regards, Matthias
Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v5]
On Tue, 15 Jul 2025 15:11:07 GMT, Xueming Shen wrote: >> Regex class should conform to **_Level 1_** of [Unicode Technical Standard >> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), >> plus RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters. >> >> This PR primarily addresses conformance with RL1.5: Simple Loose Matches, >> which requires that simple case folding be applied to literals and >> (optionally) to character classes. When applied to character classes, each >> class is expected to be closed under simple case folding. See the standard >> for a detailed explanation of what it means for a class to be “**_closed_**.” >> >> **RL1.5 states**: >> >> To meet this requirement, an implementation that supports case-sensitive >> matching should >> >> 1. Provide at least the simple, default Unicode case-insensitive >> matching, and >> 2. Specify which character properties or constructs are closed under the >> matching. >> >> **In the Pattern implementation**, 5 types of constructs may be affected by >> case sensitivity: >> >> 1. back-refs >> 2. string slices (sequences) >> 3. single character, >> 4. character families (Unicode Properties ...), and >> 5. character class ranges >> >> **Note**: Single characters and families may appear independently or within >> a character class. >> >> For case-insensitive (loose) matching, the implementation already applies >> Character.toUpperCase() and Character.toLowerCase() to **both the pattern >> and the input string** for back-refs, slices, and single characters. This >> effectively makes these constructs closed under case folding. >> >> This has been verified in the newly added test case >> **_test/jdk/java/util/regex/CaseFoldingTest.java_**. >> >> For example: >> >> Pattern.compile("(?ui)\u017f").matcher("S").matches(). => true >> Pattern.compile("(?ui)[\u017f]").matcher("S").matches()=> true >> >> The character properties (families) are not "closed" and should remain >> unchanged. This is acceptable per RL1.5, if the behavior is clearly >> specified (TBD: update javadoc to reflect this). >> >> **Current Non-Conformance: Character Class Ranges**, as reported in the >> original bug report. >> >> Pattern.compile("(?ui)[\u017f-\u017f]").matcher("S").matches() => false >> vs >> Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true >> >> vs Perl. (Perl also claims to support the Unicode's loose match with it it's >> "i" modifier) >> >> perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/ ? "true\n" : "false\n"'. => >> false >> perl -C -e 'print "S" =~ /[\x{017f}-\x{0... > > Xueming Shen has updated the pull request incrementally with one additional > commit since the last revision: > > improve the lookup logic and test case for +00df Updates look good to me. test/jdk/java/util/regex/CaseFoldingTest.java line 51: > 49: var excluded = Set.of( > 50: // these 'S' characters failed for known reason. they don't > map to their > 51: // fording form with toUpperCase or toLowerCase, only map > with case-folding. nit: fording -> folding - PR Review: https://git.openjdk.org/jdk/pull/26285#pullrequestreview-3021193767 PR Review Comment: https://git.openjdk.org/jdk/pull/26285#discussion_r2207985998