Re: RFR: 8361076: Add benchmark for ImageReader in preparation for Valhalla changes [v3]

2025-07-15 Thread Alan Bateman
On Tue, 1 Jul 2025 16:50:09 GMT, David Beaumont  wrote:

>> test/micro/org/openjdk/bench/jdk/internal/jrtfs/ImageReaderBenchmark.java 
>> line 230:
>> 
>>> 228: // Created by running "java -verbose:class", throwing away 
>>> anonymous inner
>>> 229: // classes and anything without a reliable name, and grouping by 
>>> the stated
>>> 230: // source. It's not perfect, but it's representative.
>> 
>> I don't think this is maintainable. How useful (or not) is this benchmark if 
>> the names of all the internal classes (that will change from release to 
>> release) are dropped from this?
>
> Debatable. It's obviously going to scale any results somewhat based on the 
> size of the resources and number of classes. It's kind nice to see "this 
> change removes at least N micro/milli seconds of time spent" since that's a 
> minimum set of classes we expect to always be needed, so any time saved is a 
> lower bound.
> 
> I'd say maybe leave it as is for now with a note saying "if this keeps 
> breaking, make the list less fragile".
> I'm also assuming this is only run manually, and not a part of any CI 
> pipeline ... so please let me know if I'm wrong about that.

lib/classlist is used to generate the AOT archive, I wonder if a `@Setup` could 
be used to consume that to create the INIT_CLASSES.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/26044#discussion_r2209331471


RFR: 8362244: Devkit's Oracle Linux base OS keyword is incorrectly documented

2025-07-15 Thread Volkan Yazici
`open/doc/building.md` uses the incorrect `OEL6` keyword to denote the Oracle 
Enterprise Linux base OS distribution for Devkit – replaced it with the correct 
one instead, i.e., `OL`.

-

Commit messages:
 - Fix OL-related base OS keywords in Devkit docs

Changes: https://git.openjdk.org/jdk/pull/26311/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26311&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8362244
  Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/26311.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26311/head:pull/26311

PR: https://git.openjdk.org/jdk/pull/26311


Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v5]

2025-07-15 Thread Xueming Shen
> Regex class should conform to **_Level 1_** of [Unicode Technical Standard 
> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), plus 
> RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters.
> 
> This PR primarily addresses conformance with RL1.5: Simple Loose Matches, 
> which requires that simple case folding be applied to literals and 
> (optionally) to character classes. When applied to character classes, each 
> class is expected to be closed under simple case folding. See the standard 
> for a detailed explanation of what it means for a class to be “**_closed_**.”
> 
> **RL1.5 states**: 
> 
> To meet this requirement, an implementation that supports case-sensitive 
> matching should
> 
> 1. Provide at least the simple, default Unicode case-insensitive 
> matching, and
> 2. Specify which character properties or constructs are closed under the 
> matching.
> 
> **In the Pattern implementation**, 5 types of constructs may be affected by 
> case sensitivity:
> 
> 1. back-refs
> 2. string slices (sequences)
> 3. single character,
> 4. character families (Unicode Properties ...), and
> 5. character class ranges
> 
> **Note**: Single characters and families may appear independently or within a 
> character class.
> 
> For case-insensitive (loose) matching, the implementation already applies 
> Character.toUpperCase() and Character.toLowerCase() to **both the pattern and 
> the input string** for back-refs, slices, and single characters. This 
> effectively makes these constructs closed under case folding.
> 
> This has been verified in the newly added test case 
> **_test/jdk/java/util/regex/CaseFoldingTest.java_**.
> 
> For example:
> 
> Pattern.compile("(?ui)\u017f").matcher("S").matches().  => true
> Pattern.compile("(?ui)[\u017f]").matcher("S").matches()=> true
> 
> The character properties (families)  are not "closed" and should remain 
> unchanged. This is acceptable per RL1.5, if the  behavior is clearly 
> specified (TBD: update javadoc to reflect this).
> 
> **Current Non-Conformance: Character Class Ranges**, as reported in the 
> original bug report.
> 
> Pattern.compile("(?ui)[\u017f-\u017f]").matcher("S").matches()  => false
> vs
> Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true
> 
> vs Perl. (Perl also claims to support the Unicode's loose match with it it's 
> "i" modifier)
> 
> perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/ ? "true\n" : "false\n"'.  => 
> false
> perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/**_i_** ? "true\n" : 
> "false\n"'. => **_true_**
> 
> The root issue is that the ran...

Xueming Shen has updated the pull request incrementally with one additional 
commit since the last revision:

  improve the lookup logic and test case for +00df

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/26285/files
  - new: https://git.openjdk.org/jdk/pull/26285/files/c2afc42c..b85f581f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26285&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26285&range=03-04

  Stats: 44 lines in 3 files changed: 31 ins; 3 del; 10 mod
  Patch: https://git.openjdk.org/jdk/pull/26285.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26285/head:pull/26285

PR: https://git.openjdk.org/jdk/pull/26285


Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v5]

2025-07-15 Thread Xueming Shen
On Mon, 14 Jul 2025 07:28:09 GMT, Xueming Shen  wrote:

>> src/java.base/share/classes/jdk/internal/util/regex/CaseFolding.java.template
>>  line 99:
>> 
>>> 97:  */
>>> 98: public static int[] getClassRangeClosingCharacters(int start, int 
>>> end) {
>>> 99: int[] expanded = new int[expanded_casefolding.size()];
>> 
>> Can be `Math.min(expanded_casefolding.size(), end - start)` in case the 
>> table grows large, and update the `off < expanded.length` check below too.
>
> The table itself probably isn't going to grow significantly anytime soon, and 
> we’ll likely have enough time to adjust if CaseFolding.txt does get 
> substantially bigger.
> 
> That said, I probably should consider reversing the lookup logic: instead of 
> iterating through [start, end], we could iterate over the expansion table and 
> check whether any of its code points fall within the input range, at least 
> when the range size is larger than the size of the table, kinda O(n) vs 
> O(1)-ish.

updated the lookup logic as discussed.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/26285#discussion_r2207809731


Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v6]

2025-07-15 Thread Naoto Sato
On Tue, 15 Jul 2025 17:44:00 GMT, Xueming Shen  wrote:

>> Regex class should conform to **_Level 1_** of [Unicode Technical Standard 
>> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), 
>> plus RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters.
>> 
>> This PR primarily addresses conformance with RL1.5: Simple Loose Matches, 
>> which requires that simple case folding be applied to literals and 
>> (optionally) to character classes. When applied to character classes, each 
>> class is expected to be closed under simple case folding. See the standard 
>> for a detailed explanation of what it means for a class to be “**_closed_**.”
>> 
>> **RL1.5 states**: 
>> 
>> To meet this requirement, an implementation that supports case-sensitive 
>> matching should
>> 
>> 1. Provide at least the simple, default Unicode case-insensitive 
>> matching, and
>> 2. Specify which character properties or constructs are closed under the 
>> matching.
>> 
>> **In the Pattern implementation**, 5 types of constructs may be affected by 
>> case sensitivity:
>> 
>> 1. back-refs
>> 2. string slices (sequences)
>> 3. single character,
>> 4. character families (Unicode Properties ...), and
>> 5. character class ranges
>> 
>> **Note**: Single characters and families may appear independently or within 
>> a character class.
>> 
>> For case-insensitive (loose) matching, the implementation already applies 
>> Character.toUpperCase() and Character.toLowerCase() to **both the pattern 
>> and the input string** for back-refs, slices, and single characters. This 
>> effectively makes these constructs closed under case folding.
>> 
>> This has been verified in the newly added test case 
>> **_test/jdk/java/util/regex/CaseFoldingTest.java_**.
>> 
>> For example:
>> 
>> Pattern.compile("(?ui)\u017f").matcher("S").matches().  => true
>> Pattern.compile("(?ui)[\u017f]").matcher("S").matches()=> true
>> 
>> The character properties (families)  are not "closed" and should remain 
>> unchanged. This is acceptable per RL1.5, if the  behavior is clearly 
>> specified (TBD: update javadoc to reflect this).
>> 
>> **Current Non-Conformance: Character Class Ranges**, as reported in the 
>> original bug report.
>> 
>> Pattern.compile("(?ui)[\u017f-\u017f]").matcher("S").matches()  => false
>> vs
>> Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true
>> 
>> vs Perl. (Perl also claims to support the Unicode's loose match with it it's 
>> "i" modifier)
>> 
>> perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/ ? "true\n" : "false\n"'.  => 
>> false
>> perl -C -e 'print "S" =~ /[\x{017f}-\x{0...
>
> Xueming Shen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   update to fix the typo

Marked as reviewed by naoto (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/26285#pullrequestreview-3021537588


Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v6]

2025-07-15 Thread Xueming Shen
> Regex class should conform to **_Level 1_** of [Unicode Technical Standard 
> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), plus 
> RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters.
> 
> This PR primarily addresses conformance with RL1.5: Simple Loose Matches, 
> which requires that simple case folding be applied to literals and 
> (optionally) to character classes. When applied to character classes, each 
> class is expected to be closed under simple case folding. See the standard 
> for a detailed explanation of what it means for a class to be “**_closed_**.”
> 
> **RL1.5 states**: 
> 
> To meet this requirement, an implementation that supports case-sensitive 
> matching should
> 
> 1. Provide at least the simple, default Unicode case-insensitive 
> matching, and
> 2. Specify which character properties or constructs are closed under the 
> matching.
> 
> **In the Pattern implementation**, 5 types of constructs may be affected by 
> case sensitivity:
> 
> 1. back-refs
> 2. string slices (sequences)
> 3. single character,
> 4. character families (Unicode Properties ...), and
> 5. character class ranges
> 
> **Note**: Single characters and families may appear independently or within a 
> character class.
> 
> For case-insensitive (loose) matching, the implementation already applies 
> Character.toUpperCase() and Character.toLowerCase() to **both the pattern and 
> the input string** for back-refs, slices, and single characters. This 
> effectively makes these constructs closed under case folding.
> 
> This has been verified in the newly added test case 
> **_test/jdk/java/util/regex/CaseFoldingTest.java_**.
> 
> For example:
> 
> Pattern.compile("(?ui)\u017f").matcher("S").matches().  => true
> Pattern.compile("(?ui)[\u017f]").matcher("S").matches()=> true
> 
> The character properties (families)  are not "closed" and should remain 
> unchanged. This is acceptable per RL1.5, if the  behavior is clearly 
> specified (TBD: update javadoc to reflect this).
> 
> **Current Non-Conformance: Character Class Ranges**, as reported in the 
> original bug report.
> 
> Pattern.compile("(?ui)[\u017f-\u017f]").matcher("S").matches()  => false
> vs
> Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true
> 
> vs Perl. (Perl also claims to support the Unicode's loose match with it it's 
> "i" modifier)
> 
> perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/ ? "true\n" : "false\n"'.  => 
> false
> perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/**_i_** ? "true\n" : 
> "false\n"'. => **_true_**
> 
> The root issue is that the ran...

Xueming Shen has updated the pull request incrementally with one additional 
commit since the last revision:

  update to fix the typo

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/26285/files
  - new: https://git.openjdk.org/jdk/pull/26285/files/b85f581f..a090888f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26285&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26285&range=04-05

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/26285.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26285/head:pull/26285

PR: https://git.openjdk.org/jdk/pull/26285


Re: RFR: 8361076: Add benchmark for ImageReader in preparation for Valhalla changes [v4]

2025-07-15 Thread David Beaumont
On Tue, 1 Jul 2025 17:02:27 GMT, David Beaumont  wrote:

>> Initial benchmark to capture at least some comparative measures of 
>> ImageReader performance.
>> 
>> Current results on my laptop:
>> 
>> Benchmark Mode  Cnt   ScoreError 
>>  Units
>> NewImageBenchmark.warmCache_CountAllNodes avgt5   0.785 ±  0.140 
>>  ms/op
>> NewImageBenchmark.coldStart_CountOnly   ss5  34.051 ± 17.662 
>>  ms/op
>> NewImageBenchmark.coldStart_InitAndCountss5  31.006 ±  9.674 
>>  ms/op
>> NewImageBenchmark.coldStart_LoadJavacInitClassesss5   3.752 ±  6.873 
>>  ms/op
>> 
>> The upcoming changes to ImageReader should show significant improvements to 
>> these results.
>
> David Beaumont has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Adding comment about maintainability.

Oddly while the status at the bottom of the conversation says it has "2 
approvals", the tickbox:


Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 
1 Author


is not checked. I can try to integrate, but it might not work.

-

PR Comment: https://git.openjdk.org/jdk/pull/26044#issuecomment-3075753460


Re: RFR: 8362244: Devkit's Oracle Linux base OS keyword is incorrectly documented

2025-07-15 Thread Mikael Vidstedt
On Tue, 15 Jul 2025 09:58:02 GMT, Volkan Yazici  wrote:

> `open/doc/building.md` uses the incorrect `OEL6` keyword to denote the Oracle 
> Enterprise Linux base OS distribution for Devkit – replaced it with the 
> correct one instead, i.e., `OL`.

You also need to regenerate the corresponding building.html using the 
`update-build-docs` make target

doc/building.md line 1289:

> 1287: 
> 1288: `BASE_OS` must be one of `OL` for Oracle Enterprise Linux or `Fedora` 
> (if
> 1289: not specified `OL` will be the default). If the base OS is `Fedora` the

Ever so slightly unrelated to your change, but perhaps worth fixing at the same 
time, even if it means widening the scope of the issue a bit:

There is no default, so suggest dropping the parenthesis part.

doc/building.md line 1291:

> 1289: not specified `OL` will be the default). If the base OS is `Fedora` the
> 1290: corresponding Fedora release can be specified with the help of the
> 1291: `BASE_OS_VERSION` option (with `27` as default version). If the build is

Also unrelated but maybe worth fixing: the default Fedora version is 41 
nowadays.

-

PR Review: https://git.openjdk.org/jdk/pull/26311#pullrequestreview-3021243737
PR Review Comment: https://git.openjdk.org/jdk/pull/26311#discussion_r2208015960
PR Review Comment: https://git.openjdk.org/jdk/pull/26311#discussion_r2208016509


Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v5]

2025-07-15 Thread Xueming Shen
On Tue, 15 Jul 2025 15:11:07 GMT, Xueming Shen  wrote:

>> Regex class should conform to **_Level 1_** of [Unicode Technical Standard 
>> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), 
>> plus RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters.
>> 
>> This PR primarily addresses conformance with RL1.5: Simple Loose Matches, 
>> which requires that simple case folding be applied to literals and 
>> (optionally) to character classes. When applied to character classes, each 
>> class is expected to be closed under simple case folding. See the standard 
>> for a detailed explanation of what it means for a class to be “**_closed_**.”
>> 
>> **RL1.5 states**: 
>> 
>> To meet this requirement, an implementation that supports case-sensitive 
>> matching should
>> 
>> 1. Provide at least the simple, default Unicode case-insensitive 
>> matching, and
>> 2. Specify which character properties or constructs are closed under the 
>> matching.
>> 
>> **In the Pattern implementation**, 5 types of constructs may be affected by 
>> case sensitivity:
>> 
>> 1. back-refs
>> 2. string slices (sequences)
>> 3. single character,
>> 4. character families (Unicode Properties ...), and
>> 5. character class ranges
>> 
>> **Note**: Single characters and families may appear independently or within 
>> a character class.
>> 
>> For case-insensitive (loose) matching, the implementation already applies 
>> Character.toUpperCase() and Character.toLowerCase() to **both the pattern 
>> and the input string** for back-refs, slices, and single characters. This 
>> effectively makes these constructs closed under case folding.
>> 
>> This has been verified in the newly added test case 
>> **_test/jdk/java/util/regex/CaseFoldingTest.java_**.
>> 
>> For example:
>> 
>> Pattern.compile("(?ui)\u017f").matcher("S").matches().  => true
>> Pattern.compile("(?ui)[\u017f]").matcher("S").matches()=> true
>> 
>> The character properties (families)  are not "closed" and should remain 
>> unchanged. This is acceptable per RL1.5, if the  behavior is clearly 
>> specified (TBD: update javadoc to reflect this).
>> 
>> **Current Non-Conformance: Character Class Ranges**, as reported in the 
>> original bug report.
>> 
>> Pattern.compile("(?ui)[\u017f-\u017f]").matcher("S").matches()  => false
>> vs
>> Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true
>> 
>> vs Perl. (Perl also claims to support the Unicode's loose match with it it's 
>> "i" modifier)
>> 
>> perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/ ? "true\n" : "false\n"'.  => 
>> false
>> perl -C -e 'print "S" =~ /[\x{017f}-\x{0...
>
> Xueming Shen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   improve the lookup logic and test case for +00df

Thanks for the reviews!

-

PR Comment: https://git.openjdk.org/jdk/pull/26285#issuecomment-3074413884


Re: RFR: 8362244: Devkit's Oracle Linux base OS keyword is incorrectly documented [v2]

2025-07-15 Thread Volkan Yazici
On Tue, 15 Jul 2025 16:46:24 GMT, Mikael Vidstedt  wrote:

>> Volkan Yazici has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Apply review feedback
>
> doc/building.md line 1289:
> 
>> 1287: 
>> 1288: `BASE_OS` must be one of `OL` for Oracle Enterprise Linux or `Fedora` 
>> (if
>> 1289: not specified `OL` will be the default). If the base OS is `Fedora` the
> 
> Ever so slightly unrelated to your change, but perhaps worth fixing at the 
> same time, even if it means widening the scope of the issue a bit:
> 
> There is no default, so suggest dropping the parenthesis part.

Removed in 0d85899ae1a.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/26311#discussion_r2208283263


Re: RFR: 8362244: Devkit's Oracle Linux base OS keyword is incorrectly documented [v2]

2025-07-15 Thread Volkan Yazici
> `open/doc/building.md` uses the incorrect `OEL6` keyword to denote the Oracle 
> Enterprise Linux base OS distribution for Devkit – replaced it with the 
> correct one instead, i.e., `OL`.

Volkan Yazici has updated the pull request incrementally with one additional 
commit since the last revision:

  Apply review feedback

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/26311/files
  - new: https://git.openjdk.org/jdk/pull/26311/files/d267739d..0d85899a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26311&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26311&range=00-01

  Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/26311.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26311/head:pull/26311

PR: https://git.openjdk.org/jdk/pull/26311


Integrated: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char

2025-07-15 Thread Xueming Shen
On Mon, 14 Jul 2025 04:53:13 GMT, Xueming Shen  wrote:

> Regex class should conform to **_Level 1_** of [Unicode Technical Standard 
> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), plus 
> RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters.
> 
> This PR primarily addresses conformance with RL1.5: Simple Loose Matches, 
> which requires that simple case folding be applied to literals and 
> (optionally) to character classes. When applied to character classes, each 
> class is expected to be closed under simple case folding. See the standard 
> for a detailed explanation of what it means for a class to be “**_closed_**.”
> 
> **RL1.5 states**: 
> 
> To meet this requirement, an implementation that supports case-sensitive 
> matching should
> 
> 1. Provide at least the simple, default Unicode case-insensitive 
> matching, and
> 2. Specify which character properties or constructs are closed under the 
> matching.
> 
> **In the Pattern implementation**, 5 types of constructs may be affected by 
> case sensitivity:
> 
> 1. back-refs
> 2. string slices (sequences)
> 3. single character,
> 4. character families (Unicode Properties ...), and
> 5. character class ranges
> 
> **Note**: Single characters and families may appear independently or within a 
> character class.
> 
> For case-insensitive (loose) matching, the implementation already applies 
> Character.toUpperCase() and Character.toLowerCase() to **both the pattern and 
> the input string** for back-refs, slices, and single characters. This 
> effectively makes these constructs closed under case folding.
> 
> This has been verified in the newly added test case 
> **_test/jdk/java/util/regex/CaseFoldingTest.java_**.
> 
> For example:
> 
> Pattern.compile("(?ui)\u017f").matcher("S").matches().  => true
> Pattern.compile("(?ui)[\u017f]").matcher("S").matches()=> true
> 
> The character properties (families)  are not "closed" and should remain 
> unchanged. This is acceptable per RL1.5, if the  behavior is clearly 
> specified (TBD: update javadoc to reflect this).
> 
> **Current Non-Conformance: Character Class Ranges**, as reported in the 
> original bug report.
> 
> Pattern.compile("(?ui)[\u017f-\u017f]").matcher("S").matches()  => false
> vs
> Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true
> 
> vs Perl. (Perl also claims to support the Unicode's loose match with it it's 
> "i" modifier)
> 
> perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/ ? "true\n" : "false\n"'.  => 
> false
> perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/**_i_** ? "true\n" : 
> "false\n"'. => **_true_**
> 
> The root issue is that the ran...

This pull request has now been integrated.

Changeset: 401af27b
Author:Xueming Shen 
URL:   
https://git.openjdk.org/jdk/commit/401af27b9dbc701eb48e5bc685d3ad058e0de3bc
Stats: 2084 lines in 9 files changed: 2079 ins; 0 del; 5 mod

8360459: UNICODE_CASE and character class with non-ASCII range does not match 
ASCII char

Reviewed-by: naoto

-

PR: https://git.openjdk.org/jdk/pull/26285


Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v6]

2025-07-15 Thread Xueming Shen
On Tue, 15 Jul 2025 17:47:29 GMT, Xueming Shen  wrote:

>> Regex class should conform to **_Level 1_** of [Unicode Technical Standard 
>> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), 
>> plus RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters.
>> 
>> This PR primarily addresses conformance with RL1.5: Simple Loose Matches, 
>> which requires that simple case folding be applied to literals and 
>> (optionally) to character classes. When applied to character classes, each 
>> class is expected to be closed under simple case folding. See the standard 
>> for a detailed explanation of what it means for a class to be “**_closed_**.”
>> 
>> **RL1.5 states**: 
>> 
>> To meet this requirement, an implementation that supports case-sensitive 
>> matching should
>> 
>> 1. Provide at least the simple, default Unicode case-insensitive 
>> matching, and
>> 2. Specify which character properties or constructs are closed under the 
>> matching.
>> 
>> **In the Pattern implementation**, 5 types of constructs may be affected by 
>> case sensitivity:
>> 
>> 1. back-refs
>> 2. string slices (sequences)
>> 3. single character,
>> 4. character families (Unicode Properties ...), and
>> 5. character class ranges
>> 
>> **Note**: Single characters and families may appear independently or within 
>> a character class.
>> 
>> For case-insensitive (loose) matching, the implementation already applies 
>> Character.toUpperCase() and Character.toLowerCase() to **both the pattern 
>> and the input string** for back-refs, slices, and single characters. This 
>> effectively makes these constructs closed under case folding.
>> 
>> This has been verified in the newly added test case 
>> **_test/jdk/java/util/regex/CaseFoldingTest.java_**.
>> 
>> For example:
>> 
>> Pattern.compile("(?ui)\u017f").matcher("S").matches().  => true
>> Pattern.compile("(?ui)[\u017f]").matcher("S").matches()=> true
>> 
>> The character properties (families)  are not "closed" and should remain 
>> unchanged. This is acceptable per RL1.5, if the  behavior is clearly 
>> specified (TBD: update javadoc to reflect this).
>> 
>> **Current Non-Conformance: Character Class Ranges**, as reported in the 
>> original bug report.
>> 
>> Pattern.compile("(?ui)[\u017f-\u017f]").matcher("S").matches()  => false
>> vs
>> Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true
>> 
>> vs Perl. (Perl also claims to support the Unicode's loose match with it it's 
>> "i" modifier)
>> 
>> perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/ ? "true\n" : "false\n"'.  => 
>> false
>> perl -C -e 'print "S" =~ /[\x{017f}-\x{0...
>
> Xueming Shen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   update to fix the typo

Thanks for the reviews!

-

PR Comment: https://git.openjdk.org/jdk/pull/26285#issuecomment-3074710596


Re: RFR: 8362244: Devkit's Oracle Linux base OS keyword is incorrectly documented [v2]

2025-07-15 Thread Volkan Yazici
On Tue, 15 Jul 2025 16:46:44 GMT, Mikael Vidstedt  wrote:

>> Volkan Yazici has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Apply review feedback
>
> doc/building.md line 1291:
> 
>> 1289: not specified `OL` will be the default). If the base OS is `Fedora` the
>> 1290: corresponding Fedora release can be specified with the help of the
>> 1291: `BASE_OS_VERSION` option (with `27` as default version). If the build 
>> is
> 
> Also unrelated but maybe worth fixing: the default Fedora version is 41 
> nowadays.

In 0d85899ae1a, I've removed the entire `(with ... as default version)` snippet 
– I doubt if it is worth the effort of trying to keep up with that variable, 
i.e., the Fedora version. We neither share it for OL. Please let me know if you 
agree or disagree.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/26311#discussion_r2208293042


RE: malloc/calloc return value NULL check

2025-07-15 Thread Baesken, Matthias
Regarding calloc – checking   ,

jdk.incubator.vector/unix/native/libsleef

seems to have quite a lot  of callocs without a NULL check afterwards , but 
afaik it is  3rd party coding so we will probably not touch it.


For sunFont.c  I see no direct handling  after calloc in the C code  :

java.desktop/share/native/libfontmanager/sunFont.c-67- */
java.desktop/share/native/libfontmanager/sunFont.c-68-JNIEXPORT jlong JNICALL 
Java_sun_font_NullFontScaler_getGlyphImage
java.desktop/share/native/libfontmanager/sunFont.c-69-  (JNIEnv *env, jobject 
scaler, jlong pContext, jint glyphCode) {
java.desktop/share/native/libfontmanager/sunFont.c:70:void *nullscaler = 
calloc(1, sizeof(GlyphInfo));
java.desktop/share/native/libfontmanager/sunFont.c-71-return 
ptr_to_jlong(nullscaler);
java.desktop/share/native/libfontmanager/sunFont.c-72-}
java.desktop/share/native/libfontmanager/sunFont.c-73-
--
java.desktop/share/native/libfontmanager/sunFont.c-303-JNIEXPORT jlong JNICALL
java.desktop/share/native/libfontmanager/sunFont.c-304-Java_sun_font_StrikeCache_getInvisibleGlyphPtr(JNIEnv
 *env, jclass cls) {
java.desktop/share/native/libfontmanager/sunFont.c-305-
java.desktop/share/native/libfontmanager/sunFont.c:306:GlyphInfo *info = 
(GlyphInfo*) calloc(1, sizeof(GlyphInfo));
java.desktop/share/native/libfontmanager/sunFont.c-307-return 
(jlong)(uintptr_t)info; /* invisible glyph */
java.desktop/share/native/libfontmanager/sunFont.c-308-}


(but in the Java coding  calling this,  we seem to have some special  checks 
for 0, so maybe it is fine)


Here

java.base/windows/native/libjli/java_md.c:951:appArgIdx = calloc(argc, 
sizeof(int));
java.base/windows/native/libjli/java_md.c-952-for (i = idx, j = 0; i < 
stdargc; i++) {
java.base/windows/native/libjli/java_md.c-953-if (isTool) { // filter 
-J used by tools to pass JVM options
java.base/windows/native/libjli/java_md.c-954-arg = stdargs[i].arg;

we seem to miss a check, should I open a JBS issue ?


Best regards, Matthias


From: Thomas Stüfe 
Sent: Friday, 11 July 2025 18:19
To: Baesken, Matthias 
Cc: build-dev@openjdk.org
Subject: Re: malloc/calloc return value NULL check

Absolutely, yes.

The larger the allocated size, the more important. Linux kernel, by default, 
only protects a small area against NULL accesses; depending on distro, 4KB or 
64 (?) KB. And the JVM, at various places, allocates in low-area ranges. So 
accessing NULL+ can actually land you at a valid unrelated 
address instead of faulting.

/Thomas

On Fri, Jul 11, 2025 at 2:57 PM Baesken, Matthias 
mailto:matthias.baes...@sap.com>> wrote:
Hi, when playing around with the  GCC static analyzer  ( 
https://developers.redhat.com/articles/2022/04/12/state-static-analysis-gcc-12-compiler
 )   I noticed
a lot of complaints about  missing  NULL checks of  malloc/calloc  return 
values in the code base.

While we check these return values for NULL at a lot of places in the codebase, 
 it is not done always.
Should we do it always  (except 3rd party code probably where we do not want to 
have large diffs to upstream) ?

Or is it considered not important enough  to do it always?

Best regards, Matthias


Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v5]

2025-07-15 Thread Naoto Sato
On Tue, 15 Jul 2025 15:11:07 GMT, Xueming Shen  wrote:

>> Regex class should conform to **_Level 1_** of [Unicode Technical Standard 
>> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), 
>> plus RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters.
>> 
>> This PR primarily addresses conformance with RL1.5: Simple Loose Matches, 
>> which requires that simple case folding be applied to literals and 
>> (optionally) to character classes. When applied to character classes, each 
>> class is expected to be closed under simple case folding. See the standard 
>> for a detailed explanation of what it means for a class to be “**_closed_**.”
>> 
>> **RL1.5 states**: 
>> 
>> To meet this requirement, an implementation that supports case-sensitive 
>> matching should
>> 
>> 1. Provide at least the simple, default Unicode case-insensitive 
>> matching, and
>> 2. Specify which character properties or constructs are closed under the 
>> matching.
>> 
>> **In the Pattern implementation**, 5 types of constructs may be affected by 
>> case sensitivity:
>> 
>> 1. back-refs
>> 2. string slices (sequences)
>> 3. single character,
>> 4. character families (Unicode Properties ...), and
>> 5. character class ranges
>> 
>> **Note**: Single characters and families may appear independently or within 
>> a character class.
>> 
>> For case-insensitive (loose) matching, the implementation already applies 
>> Character.toUpperCase() and Character.toLowerCase() to **both the pattern 
>> and the input string** for back-refs, slices, and single characters. This 
>> effectively makes these constructs closed under case folding.
>> 
>> This has been verified in the newly added test case 
>> **_test/jdk/java/util/regex/CaseFoldingTest.java_**.
>> 
>> For example:
>> 
>> Pattern.compile("(?ui)\u017f").matcher("S").matches().  => true
>> Pattern.compile("(?ui)[\u017f]").matcher("S").matches()=> true
>> 
>> The character properties (families)  are not "closed" and should remain 
>> unchanged. This is acceptable per RL1.5, if the  behavior is clearly 
>> specified (TBD: update javadoc to reflect this).
>> 
>> **Current Non-Conformance: Character Class Ranges**, as reported in the 
>> original bug report.
>> 
>> Pattern.compile("(?ui)[\u017f-\u017f]").matcher("S").matches()  => false
>> vs
>> Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true
>> 
>> vs Perl. (Perl also claims to support the Unicode's loose match with it it's 
>> "i" modifier)
>> 
>> perl -C -e 'print "S" =~ /[\x{017f}-\x{017f}]/ ? "true\n" : "false\n"'.  => 
>> false
>> perl -C -e 'print "S" =~ /[\x{017f}-\x{0...
>
> Xueming Shen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   improve the lookup logic and test case for +00df

Updates look good to me.

test/jdk/java/util/regex/CaseFoldingTest.java line 51:

> 49: var excluded = Set.of(
> 50: // these 'S' characters failed for known reason. they don't 
> map to their
> 51: // fording form with toUpperCase or toLowerCase, only map 
> with case-folding.

nit: fording -> folding

-

PR Review: https://git.openjdk.org/jdk/pull/26285#pullrequestreview-3021193767
PR Review Comment: https://git.openjdk.org/jdk/pull/26285#discussion_r2207985998