Re: RFR: 8350880: (zipfs) Add support for read-only zip file systems [v12]

2025-05-23 Thread Xueming Shen
On Wed, 21 May 2025 23:30:18 GMT, David Beaumont wrote: >> Adding read-only support to ZipFileSystem. >> >> The new `accessMode` environment property allows for readOnly and readWrite >> values, and ensures that the requested mode is consistent with what's >> returned. >> >> This involved a l

Re: RFR: 8358533: Improve performance of java.io.Reader.readAllLines

2025-06-20 Thread Xueming Shen
On Fri, 20 Jun 2025 15:52:08 GMT, Roger Riggs wrote: >> My suggestion is to call `new StringBuilder(0)` as it is possible this is >> completely unused because we always hit the `eol && sb.length() == 0` path >> below. > > The change is motivated by performance, but there will be many inputs tha

Re: RFR: 8191963: Path.equals() and File.equals() return true for two different files on Windows

2025-06-16 Thread Xueming Shen
On Thu, 12 Jun 2025 21:12:53 GMT, Brian Burkhalter wrote: > Replace logic in `java.io.WinNTFileSystems.compare(File,File)` with that from > `sun.nio.fs.WindowsPath.compareTo(Path)`. Just wondering how Windows implementation really behaves for its 'case-insensitive-comparing" for "\u0131 vs "I"

Re: RFR: 8358533: Improve performance of java.io.Reader.readAllLines

2025-06-17 Thread Xueming Shen
On Wed, 18 Jun 2025 01:34:14 GMT, Brian Burkhalter wrote: >> src/java.base/share/classes/java/io/Reader.java line 469: >> >>> 467: if (c == '\r' || c == '\n') >>> 468: break; >>> 469: term++; >> >> It might be worth adding a test o

Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v3]

2025-07-14 Thread Xueming Shen
compile("(?ui)[\u017f-\u017f]").matcher("S").matches() => false > vs > Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true > > vs Perl. (Perl also claims to support the Unicode's loose match with it it's &g

Re: RFR: 8358533: Improve performance of java.io.Reader.readAllLines [v7]

2025-06-27 Thread Xueming Shen
On Fri, 27 Jun 2025 17:32:25 GMT, Brian Burkhalter wrote: >> Replaces the implementation `readAllCharsAsString().lines().toList()` with >> reading into a temporary `char` array which is then processed to detect line >> terminators and copy non-terminating characters into strings which are added

Re: RFR: 8358533: Improve performance of java.io.Reader.readAllLines [v8]

2025-06-28 Thread Xueming Shen
On Fri, 27 Jun 2025 19:41:01 GMT, Brian Burkhalter wrote: >> Replaces the implementation `readAllCharsAsString().lines().toList()` with >> reading into a temporary `char` array which is then processed to detect line >> terminators and copy non-terminating characters into strings which are added

Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char

2025-07-14 Thread Xueming Shen
On Mon, 14 Jul 2025 05:01:17 GMT, Chen Liang wrote: >> Regex class should conform to **_Level 1_** of [Unicode Technical Standard >> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), >> plus RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters. >> >> This PR

Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v2]

2025-07-14 Thread Xueming Shen
On Mon, 14 Jul 2025 05:08:58 GMT, Chen Liang wrote: >> Xueming Shen has updated the pull request incrementally with one additional >> commit since the last revision: >> >> update to address the review comments > > make/jdk/src/classes/build/tools/generatechar

Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v2]

2025-07-14 Thread Xueming Shen
("S").matches()=> true > > The character properties (families) are not "closed" and should remain > unchanged. This is acceptable per RL1.5, if the behavior is clearly > specified (TBD: update javadoc to reflect this). > > **Current Non-Conforman

RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char

2025-07-13 Thread Xueming Shen
Regex class should conform to **_Level 1_** of [Unicode Technical Standard #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), plus RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters. This PR primarily addresses conformance with RL1.5: Simple Loose Matches, whi

Re: RFR: 8358533: Improve performance of java.io.Reader.readAllLines [v11]

2025-07-03 Thread Xueming Shen
On Wed, 2 Jul 2025 21:37:00 GMT, Brian Burkhalter wrote: >> Replaces the implementation `readAllCharsAsString().lines().toList()` with >> reading into a temporary `char` array which is then processed to detect line >> terminators and copy non-terminating characters into strings which are added

RFR: 8354490: Pattern.CANON_EQ causes a pattern to not match a string with a UNICODE variation

2025-06-25 Thread Xueming Shen
The root cause is an off-by-one bug introduced in an old change we made years ago for Pattern.CANON_EQ. See https://cr.openjdk.org/~sherman/regexCE/Note.txt for background info. As described in the writeup above the basic logic of the change is to: **generate the permutations, create the alterna

Re: RFR: 8358533: Improve performance of java.io.Reader.readAllLines [v4]

2025-06-25 Thread Xueming Shen
On Tue, 24 Jun 2025 18:51:04 GMT, Brian Burkhalter wrote: >> Right, the specification here requires an unmodifiable List, so an >> unmodifiable wrapper or a list from `List.copyOf()` is appropriate. > > Fixed in > [d5abfa4](https://github.com/openjdk/jdk/pull/25863/commits/d5abfa450cb3fcd604560

Re: RFR: 8361018: Re-examine buffering and encoding conversion in BufferedWriter [v6]

2025-07-01 Thread Xueming Shen
On Tue, 1 Jul 2025 14:33:34 GMT, Brett Okken wrote: > StreamEncoder/CharsetEncoder that is really forcing that - and the > conversion to utf-16 is required for optimal encoder performance. It might be worth exploring the idea of using string-buffer as the buffer to carry the byte[] + coder (d

Re: RFR: 8358533: Improve performance of java.io.Reader.readAllLines [v8]

2025-06-30 Thread Xueming Shen
On Fri, 27 Jun 2025 19:41:01 GMT, Brian Burkhalter wrote: >> Replaces the implementation `readAllCharsAsString().lines().toList()` with >> reading into a temporary `char` array which is then processed to detect line >> terminators and copy non-terminating characters into strings which are added

Re: RFR: 8361018: Re-examine buffering and encoding conversion in BufferedWriter [v6]

2025-06-30 Thread Xueming Shen
On Tue, 1 Jul 2025 00:01:21 GMT, Shaojin Wen wrote: >> BufferedWriter -> OutputStreamWriter -> StreamEncoder >> >> In this call chain, BufferedWriter has a char[] buffer, and StreamEncoder >> has a ByteBuffer. There are two layers of cache here, or the BufferedWriter >> layer can be removed. A

Integrated: 8354490: Pattern.CANON_EQ causes a pattern to not match a string with a UNICODE variation

2025-06-30 Thread Xueming Shen
On Wed, 25 Jun 2025 18:51:52 GMT, Xueming Shen wrote: > The root cause is an off-by-one bug introduced in an old change we made years > ago for Pattern.CANON_EQ. > See https://cr.openjdk.org/~sherman/regexCE/Note.txt for background info. > > As described in the writeup above the

Re: RFR: 8354490: Pattern.CANON_EQ causes a pattern to not match a string with a UNICODE variation

2025-06-30 Thread Xueming Shen
On Wed, 25 Jun 2025 18:51:52 GMT, Xueming Shen wrote: > The root cause is an off-by-one bug introduced in an old change we made years > ago for Pattern.CANON_EQ. > See https://cr.openjdk.org/~sherman/regexCE/Note.txt for background info. > > As described in the writeup above the

Re: RFR: 8361018: Re-examine buffering and encoding conversion in BufferedWriter [v6]

2025-06-30 Thread Xueming Shen
On Tue, 1 Jul 2025 02:38:52 GMT, Shaojin Wen wrote: > After the introduction of JEP 254 Compact Strings, many java.io-related codes > need to be optimized for Compact Strings. If we plan to remove the option > COMPACT_STRING = off, we should do these optimizations before that. now you make me

Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v2]

2025-07-14 Thread Xueming Shen
On Mon, 14 Jul 2025 18:10:53 GMT, Naoto Sato wrote: > Looks good. Thanks for adding case folding support which is long overdue 🙂 > Since this is adding a new support for casefolding for character class > ranges, I think CSR and a release note should be considered. Thanks for the review. Arguab

Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v5]

2025-07-15 Thread Xueming Shen
compile("(?ui)[\u017f-\u017f]").matcher("S").matches() => false > vs > Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true > > vs Perl. (Perl also claims to support the Unicode's loose match with it it's &g

Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v5]

2025-07-15 Thread Xueming Shen
On Mon, 14 Jul 2025 07:28:09 GMT, Xueming Shen wrote: >> src/java.base/share/classes/jdk/internal/util/regex/CaseFolding.java.template >> line 99: >> >>> 97: */ >>> 98: public static int[] getClassRangeClosingCharacters(int start, int >>>

Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v6]

2025-07-15 Thread Xueming Shen
compile("(?ui)[\u017f-\u017f]").matcher("S").matches() => false > vs > Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true > > vs Perl. (Perl also claims to support the Unicode's loose match with it it's &g

Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v5]

2025-07-15 Thread Xueming Shen
On Tue, 15 Jul 2025 15:11:07 GMT, Xueming Shen wrote: >> Regex class should conform to **_Level 1_** of [Unicode Technical Standard >> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), >> plus RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clu

Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v6]

2025-07-15 Thread Xueming Shen
On Tue, 15 Jul 2025 17:47:29 GMT, Xueming Shen wrote: >> Regex class should conform to **_Level 1_** of [Unicode Technical Standard >> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), >> plus RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clu

Integrated: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char

2025-07-15 Thread Xueming Shen
On Mon, 14 Jul 2025 04:53:13 GMT, Xueming Shen wrote: > Regex class should conform to **_Level 1_** of [Unicode Technical Standard > #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), plus > RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters. >

Re: RFR: 8361613: System.console() should only be available for interactive terminal [v3]

2025-07-15 Thread Xueming Shen
On Tue, 15 Jul 2025 13:29:03 GMT, Alan Bateman wrote: >> src/java.base/share/classes/java/lang/System.java line 244: >> >>> 242: * >>> 243: * @return The system console, if any, otherwise {@code null}. >>> 244: * @see Console >> >> The method declaration already links to Console

Re: RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v4]

2025-07-14 Thread Xueming Shen
compile("(?ui)[\u017f-\u017f]").matcher("S").matches() => false > vs > Pattern.compile("(?ui)[S-S]").matcher("\u017f").matches(). => true > > vs Perl. (Perl also claims to support the Unicode's loose match with it it's &g