Re: RFR: 8345687: Improve the implementation of SegmentFactories::allocateSegment [v6]

2025-03-12 Thread Maurizio Cimadamore
On Fri, 7 Mar 2025 17:38:13 GMT, Quan Anh Mai  wrote:

>> Hi,
>> 
>> This patch improves the performance of a typical `Arena::allocate` in 
>> several ways:
>> 
>> - Delay the creation of the NativeMemorySegmentImpl. This avoids the merge 
>> of the instance with the one obtained from the call in the uncommon path, 
>> increasing the chance the object being scalar replaced.
>> - Split the allocation of over-aligned memory to a slow-path method.
>> - Align the memory to 8 bytes, allowing faster zeroing.
>> - Use a dedicated method to zero the just-allocated native memory, reduce 
>> code size and make it more straightforward.
>> - Make `VM.pageAlignDirectMemory` a `Boolean` instead of a `boolean` so that 
>> `false` value can be constant folded.
>> 
>> Please take a look and leave your reviews, thanks a lot.
>
> Quan Anh Mai has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains nine additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into segmentallocate
>  - revert changes to CallocArena
>  - Merge branch 'master' into segmentallocate
>  - copyright
>  - Merge branch 'master' into segmentallocate
>  - wrong init
>  - move segment instance creation to SegmentFactories
>  - address review
>  - improve the implementation of SegmentFactories::allocateSegment

Looks good - just left a minor (optional) stylistic comment

test/micro/org/openjdk/bench/java/lang/foreign/AllocTest.java line 26:

> 24: package org.openjdk.bench.java.lang.foreign;
> 25: 
> 26: import java.lang.foreign.*;

I believe it would be better to leave the imports alone -- as all the other 
benchmarks also use the "more explicit" style.

-

Marked as reviewed by mcimadamore (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/22610#pullrequestreview-2678029020
PR Review Comment: https://git.openjdk.org/jdk/pull/22610#discussion_r1991245773


Re: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v17]

2025-03-12 Thread Thomas Schatzl
On Wed, 12 Mar 2025 12:23:50 GMT, Albert Mingkun Yang  wrote:

>> Thomas Schatzl has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains 24 additional 
>> commits since the last revision:
>> 
>>  - Merge branch 'master' into 8342382-card-table-instead-of-dcq
>>  - * optimized RISCV gen_write_ref_array_post_barrier() implementation 
>> contributed by @RealFYang
>>  - * fix card table verification crashes: in the first refinement phase, 
>> when switching the global card tables, we need to re-check whether we are 
>> still in the same sweep epoch or not. It might have changed due to a GC 
>> interrupting acquiring the Heap_lock. Otherwise new threads will scribble on 
>> the refinement table.
>>Cause are last-minute changes before making the PR ready to review.
>>
>>  Testing: without the patch, occurs fairly frequently when continuously
>>(1 in 20) starting refinement. Does not afterward.
>>  - * ayang review 3
>>  * comments
>>  * minor refactorings
>>  - * iwalulya review
>>  * renaming
>>  * fix some includes, forward declaration
>>  - * fix whitespace
>>* additional whitespace between log tags
>>* rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the 
>> other similar rename
>>  - ayang review
>>  * renamings
>>  * refactorings
>>  - iwalulya review
>>  * comments for variables tracking to-collection-set and just dirtied 
>> cards after GC/refinement
>>  * predicate for determining whether the refinement has been disabled
>>  * some other typos/comment improvements
>>  * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with 
>> naming
>>  - * ayang review - fix comment
>>  - * iwalulya review 2
>>  * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState
>>  * some additional documentation
>>  - ... and 14 more: https://git.openjdk.org/jdk/compare/5727f166...aec95051
>
> src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 263:
> 
>> 261: 
>> 262: SuspendibleThreadSetLeaver sts_leave;
>> 263: VMThread::execute(&op);
> 
> Can you elaborate what synchronization this VM op is trying to achieve?

Memory visibility for refinement threads for the references written to the 
heap. Without them, they may not have received the most recent values.
This is the same as the `StoreLoad` barriers synchronization between mutator 
and refinement threads imo.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1991561707


RFR: 8351593: [JMH] test PhoneCode.Bulk reports NPE exception

2025-03-12 Thread Vladimir Ivanov
the test output was cleanup for common execution.

-

Commit messages:
 - JDK-8351593 [JMH] test PhoneCode.Bulk reports NPE exception

Changes: https://git.openjdk.org/jdk/pull/24011/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24011&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8351593
  Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/24011.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24011/head:pull/24011

PR: https://git.openjdk.org/jdk/pull/24011


Re: RFR: 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset [v4]

2025-03-12 Thread Jaikiran Pai
> Can I please get a review of this change which proposes to fix an issue 
> `java.util.zip.ZipFile` which would cause failures when multiple instances of 
> `ZipFile` using non-UTF8 `Charset` were operating against the same underlying 
> ZIP file? This addresses https://bugs.openjdk.org/browse/JDK-8347712.
> 
> ZIP file specification allows for ZIP entries to mark a `UTF-8` flag to 
> indicate that the entry name and comment are encoded using UTF8. A 
> `java.util.zip.ZipFile` can be constructed by passing it a `Charset`. This 
> `Charset` (which defaults to UTF-8) gets used for decoding entry names and 
> comments for non-UTF8 entries.
> 
> The internal implementation of `ZipFile` uses a `ZipCoder` (backed by 
> `java.nio.charset.CharsetEncoder/CharsetDecoder` instance) for the given 
> `Charset`. Except for UTF8 `ZipCoder`, other `ZipCoder`s are not thread safe.
> 
> The internal implementation of `ZipFile` maintains a cache of 
> `ZipFile$Source`. A `Source` corresponds to the underlying ZIP file and 
> during construction, uses a `ZipCoder` for parsing the ZIP entries and once 
> constructed holds on to the parsed ZIP structure. Multiple instances of a 
> `ZipFile` which all correspond to the same ZIP file on the filesystem, share 
> a single instance of `Source` (after the `Source` has been constructed and 
> cached). Although `ZipFile` instances aren't expected to be thread-safe, the 
> fact that multiple different instances of `ZipFile` could be sharing the same 
> instance of `Source` in concurrent threads, mandates that the `Source` must 
> be thread-safe.
> 
> In Java 15, we did a performance optimization through 
> https://bugs.openjdk.org/browse/JDK-8243469. As part of that change, we 
> started holding on to the `ZipCoder` instance (corresponding to the `Charset` 
> provided during `ZipFile` construction) in the `Source`. This stored 
> `ZipCoder` was then used for `ZipFile` operations when working with the ZIP 
> entries. As noted previously, any non-UTF8 `ZipCoder` is not thread-safe and 
> as a result, any usages of `ZipCoder` in the `Source` makes `Source` not 
> thread-safe too. That effectively violates the requirement that `Source` must 
> be thread-safe to allow for its usage in multiple different `ZipFile` 
> instances concurrently. This then causes `ZipFile` usages to fail in 
> unexpected ways like the one shown in the linked 
> https://bugs.openjdk.org/browse/JDK-8347712.
> 
> The commit in this PR addresses the issue by not maintaining `ZipCoder` as a 
> instance field of `Source`. Instead the `ZipCoder` is now maintained in the 
> `ZipFile`,...

Jaikiran Pai has updated the pull request incrementally with one additional 
commit since the last revision:

  add code comment

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/23986/files
  - new: https://git.openjdk.org/jdk/pull/23986/files/b6486e7a..935b04ed

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=23986&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23986&range=02-03

  Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/23986.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23986/head:pull/23986

PR: https://git.openjdk.org/jdk/pull/23986


Withdrawn: 8347491: IllegalArgumentationException thrown by ThreadPoolExecutor doesn't have a useful message

2025-03-12 Thread duke
On Mon, 13 Jan 2025 16:28:52 GMT, Viktor Klang  wrote:

> Seems sensible to improve this given that it can be tricky to figure out 
> which parameter caused the exception to be thrown.

This pull request has been closed without being integrated.

-

PR: https://git.openjdk.org/jdk/pull/23081


Re: RFR: 8351435: Change JLine Console implementation back to opt-in [v2]

2025-03-12 Thread Naoto Sato
> JDK has been using JLine based Console implementation, in JDK20 as opt-in, 
> then in JDK22 as the default. While it has been providing rich functionality 
> for Console, it is increasingly difficult to maintain as a Console 
> implementation. In light of the on-ramp feature 
> (https://bugs.openjdk.org/browse/JDK-8344699), which proposes switching 
> `java.lang.IO` class to use `System.in` and `System.out` instead of Console, 
> reverting the default Console implementation to JDK's built-in one in the 
> java.base module is considered desirable. Some tests are modified along with 
> this change, among them test/jdk/java/io/Console/ConsolePromptTest.java 
> changes were contributed by @lahodaj (thanks!)

Naoto Sato has updated the pull request incrementally with one additional 
commit since the last revision:

  Reflects review comment

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/23993/files
  - new: https://git.openjdk.org/jdk/pull/23993/files/960e6dc5..92c9a344

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=23993&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23993&range=00-01

  Stats: 19 lines in 1 file changed: 10 ins; 3 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/23993.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23993/head:pull/23993

PR: https://git.openjdk.org/jdk/pull/23993


Re: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v18]

2025-03-12 Thread Thomas Schatzl
> Hi all,
> 
>   please review this change that implements (currently Draft) JEP: G1: 
> Improve Application Throughput with a More Efficient Write-Barrier.
> 
> The reason for posting this early is that this is a large change, and the JEP 
> process is already taking very long with no end in sight but we would like to 
> have this ready by JDK 25.
> 
> ### Current situation
> 
> With this change, G1 will reduce the post write barrier to much more resemble 
> Parallel GC's as described in the JEP. The reason is that G1 lacks in 
> throughput compared to Parallel/Serial GC due to larger barrier.
> 
> The main reason for the current barrier is how g1 implements concurrent 
> refinement:
> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers 
> (dirty card queues - dcq) containing the location of dirtied cards. 
> Refinement threads pick up their contents to re-refine. The barrier needs to 
> enqueue card locations.
> * For correctness dirty card updates requires fine-grained synchronization 
> between mutator and refinement threads,
> * Finally there is generic code to avoid dirtying cards altogether (filters), 
> to avoid executing the synchronization and the enqueuing as much as possible.
> 
> These tasks require the current barrier to look as follows for an assignment 
> `x.a = y` in pseudo code:
> 
> 
> // Filtering
> if (region(@x.a) == region(y)) goto done; // same region check
> if (y == null) goto done; // null value check
> if (card(@x.a) == young_card) goto done;  // write to young gen check
> StoreLoad;// synchronize
> if (card(@x.a) == dirty_card) goto done;
> 
> *card(@x.a) = dirty
> 
> // Card tracking
> enqueue(card-address(@x.a)) into thread-local-dcq;
> if (thread-local-dcq is not full) goto done;
> 
> call runtime to move thread-local-dcq into dcqs
> 
> done:
> 
> 
> Overall this post-write barrier alone is in the range of 40-50 total 
> instructions, compared to three or four(!) for parallel and serial gc.
> 
> The large size of the inlined barrier not only has a large code footprint, 
> but also prevents some compiler optimizations like loop unrolling or inlining.
> 
> There are several papers showing that this barrier alone can decrease 
> throughput by 10-20% 
> ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is 
> corroborated by some benchmarks (see links).
> 
> The main idea for this change is to not use fine-grained synchronization 
> between refinement and mutator threads, but coarse grained based on 
> atomically switching card tables. Mutators only work on the "primary" card 
> table, refinement threads on a se...

Thomas Schatzl has updated the pull request incrementally with one additional 
commit since the last revision:

  * ayang review
* remove unnecessary STSleaver
* some more documentation around to_collection_card card color

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/23739/files
  - new: https://git.openjdk.org/jdk/pull/23739/files/aec95051..3766b76c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=17
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=16-17

  Stats: 18 lines in 2 files changed: 5 ins; 4 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/23739.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739

PR: https://git.openjdk.org/jdk/pull/23739


Re: RFR: 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset [v2]

2025-03-12 Thread Eirik Bjørsnøs
On Wed, 12 Mar 2025 06:42:43 GMT, Jaikiran Pai  wrote:

>> src/java.base/share/classes/java/util/zip/ZipFile.java line 376:
>> 
>>> 374:  * @param pos the entry position
>>> 375:  */
>>> 376: private static boolean useUTF8Coder(final byte[] cen, final int 
>>> pos) {
>> 
>> This seems mostly used to determine which `ZipCoder` to pick. Could we fold 
>> this into the method, calling it `zipCoderForPos` and make it just return 
>> the `ZipCoder`, like we had in `Source`?
>> 
>> If it needs to be static, then the non-UTF8/fallback ZipCoder could be 
>> passed as a parameter?
>> 
>> If a method to determine whether a CEN entry uses UTF-8 is still needed, 
>> then I think it would be more appropriately named  `isUTF8Entry`.
>
>> This seems mostly used to determine which ZipCoder to pick. Could we fold 
>> this into the method, calling it zipCoderForPos and make it just return the 
>> ZipCoder, like we had in Source?
> 
> I intentionally removed the `zipCoderForPos()` method and instead introduced 
> this static method to avoid the case where the code ends up calling this 
> method and then stores the returned `ZipCoder` (like was being done in 
> `Source`). Instead with this new method, it now allows the call sites to 
> determine if a UTF8 `ZipCoder` is needed or a non-UTF8 one and then decide 
> where to get the non-UTF8 `ZipCoder` from. If the call site is a instance 
> method in `ZipFile`, then it was use the `ZipFile`'s `zipCoder` field and if 
> the call site is within `Source`, when the `Source` is being instantiated 
> then it constructs a non-UTF8 `ZipCoder` afresh. That level of detail is 
> better dealt at the call site instead of a method like `zipCoderForPos()`.

Hello Jaikiran,

I'm not convinced dropping `zipCoderForPos` is a step forward:

* `zipCoderForPos` included a short-circuit optimization logic which skipped 
reading the CENFLG field in the case where the opening charset was UTF-8 (which 
it commonly is, and always for JarFile). This optimization seems lost in the 
currrent state of this PR and could impact lookup performance. It would be 
strange to repeat this optimization at all call sites.
* The only thing differing between callsites seems to be which ZipCoder to use 
as a "fallback" when it's not UTF-8. This ZipCoder instance can easilly be 
passed in as a parameter, and will de-duplicate logic at the call sites.
* The only "cumbersome" call site seems to be `initCEN`, caused by the lazy 
initialization of the non-UTF-8 ZipCoder. But that lazy initialization seems 
questionable: First, ZipCoder is already lazy in the way it initializes 
encoders and decoders. An unused ZipCoder is just a thin wrapper around a 
Charset and should not incur significant cost until it gets used. Second, the 
`ZipFile` constructor actually constructs a `ZipCoder` instance, would it not 
be better to just pass this down to initCEN as a parameter? That would avoid 
the cost of initializing encoders and decoders twice for the common case of 
single-threaded `ZipFile` access, once for initCEN, then again on the first 
lookup.  

Here's a patch for a quick implementation of the above on top of your PR. (Code 
speeks better than words some times :-)

This retains the CENFLG optimization, simplifies logic at call sites and 
prevents duplicated initialization of ZipCoder beteween initCEN and lookups:


Index: src/java.base/share/classes/java/util/zip/ZipFile.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
diff --git a/src/java.base/share/classes/java/util/zip/ZipFile.java 
b/src/java.base/share/classes/java/util/zip/ZipFile.java
--- a/src/java.base/share/classes/java/util/zip/ZipFile.java(revision 
935b04ed00522aa92105292baa0693c55b1356ae)
+++ b/src/java.base/share/classes/java/util/zip/ZipFile.java(date 
1741800197915)
@@ -201,8 +201,8 @@
 this.fileName = file.getName();
 long t0 = System.nanoTime();
 
-this.res = new CleanableResource(this, charset, file, mode);
 this.zipCoder = ZipCoder.get(charset);
+this.res = new CleanableResource(this, charset, zipCoder, file, mode);
 
 PerfCounter.getZipFileOpenTime().addElapsedTimeFrom(t0);
 PerfCounter.getZipFileCount().increment();
@@ -368,13 +368,19 @@
 }
 
 /**
- * {@return true if the ZIP entry corresponding to the given {@code pos} 
uses UTF-8
- * for entry name and comment field encoding, false otherwise}
+ * {@return a ZipCoder for decoding name and comment fields for the given 
CEN entry position}
  * @param cen the CEN
  * @param pos the entry position
+ * @param fallback the ZipCoder to return when the entry is not flagged as 
UTF-8
  */
-private static boolean useUTF8Coder(final byte[] cen, final int pos) {
-return (CENFLG(cen, pos) & USE_UTF8) != 0;
+private static ZipCoder zipCoderFor(final byte[] cen, final int pos, 
Z

Re: RFR: 8351374: Improve comment about queue.remove timeout in CleanerImpl.run

2025-03-12 Thread Kim Barrett
On Mon, 10 Mar 2025 23:14:23 GMT, Brent Christian  wrote:

>> Please review this revision of a previously puzzling comment intending to
>> provide the rationale for a bit of non-obvious code.
>
> src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java line 142:
> 
>> 140: // while there are registered cleanables for other 
>> objects.
>> 141: // If the application cleans all remaining cleanables, 
>> there
>> 142: // won't be any references enqueued to unblock this.  
>> Using a
> 
> I agree that calling `queue.remove()` with a timeout is the right approach.
> But I have a question:
> In the case where the Cleaner's `CleanerCleanable` has already run, and we 
> get to processing the last registered cleanable on `activeList`:
> When we do the `ref.clean()`, the `activeList` becomes empty, and we'll drop 
> out of the `while()` loop. So I'm not seeing how we would attempt another 
> `queue.remove()` in this case.
> What am I missing?

You are missing that this loop is not the only place that potentially removes
references from `activeList`. The application may also do so, when directly
cleaning (as part of a `close()` operation or the like). So the cleaner's
cleanable may be dropped, removed, and cleaned but with live entries still in
`activeList`, and this loop ends up blocked in `remove` because there's
nothing for it to do.  The application later closes all of the remaining
entries in the `activeList`, which doesn't cause anything to be enqueued on
the cleaner's queue, so the cleaner thread remains blocked in `remove`.  But
now `activeList` is empty and will never be added to, and the queue is also
empty, and the thread is blocked in `remove` with nothing (other than the
timeout) to break it out.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/23952#discussion_r1992205243


Re: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v17]

2025-03-12 Thread Albert Mingkun Yang
On Wed, 12 Mar 2025 11:58:45 GMT, Thomas Schatzl  wrote:

>> Hi all,
>> 
>>   please review this change that implements (currently Draft) JEP: G1: 
>> Improve Application Throughput with a More Efficient Write-Barrier.
>> 
>> The reason for posting this early is that this is a large change, and the 
>> JEP process is already taking very long with no end in sight but we would 
>> like to have this ready by JDK 25.
>> 
>> ### Current situation
>> 
>> With this change, G1 will reduce the post write barrier to much more 
>> resemble Parallel GC's as described in the JEP. The reason is that G1 lacks 
>> in throughput compared to Parallel/Serial GC due to larger barrier.
>> 
>> The main reason for the current barrier is how g1 implements concurrent 
>> refinement:
>> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of 
>> buffers (dirty card queues - dcq) containing the location of dirtied cards. 
>> Refinement threads pick up their contents to re-refine. The barrier needs to 
>> enqueue card locations.
>> * For correctness dirty card updates requires fine-grained synchronization 
>> between mutator and refinement threads,
>> * Finally there is generic code to avoid dirtying cards altogether 
>> (filters), to avoid executing the synchronization and the enqueuing as much 
>> as possible.
>> 
>> These tasks require the current barrier to look as follows for an assignment 
>> `x.a = y` in pseudo code:
>> 
>> 
>> // Filtering
>> if (region(@x.a) == region(y)) goto done; // same region check
>> if (y == null) goto done; // null value check
>> if (card(@x.a) == young_card) goto done;  // write to young gen check
>> StoreLoad;// synchronize
>> if (card(@x.a) == dirty_card) goto done;
>> 
>> *card(@x.a) = dirty
>> 
>> // Card tracking
>> enqueue(card-address(@x.a)) into thread-local-dcq;
>> if (thread-local-dcq is not full) goto done;
>> 
>> call runtime to move thread-local-dcq into dcqs
>> 
>> done:
>> 
>> 
>> Overall this post-write barrier alone is in the range of 40-50 total 
>> instructions, compared to three or four(!) for parallel and serial gc.
>> 
>> The large size of the inlined barrier not only has a large code footprint, 
>> but also prevents some compiler optimizations like loop unrolling or 
>> inlining.
>> 
>> There are several papers showing that this barrier alone can decrease 
>> throughput by 10-20% 
>> ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is 
>> corroborated by some benchmarks (see links).
>> 
>> The main idea for this change is to not use fine-grained synchronization 
>> between refinement and mutator threads, but coarse grained based on 
>> atomically switching c...
>
> Thomas Schatzl has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 24 additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into 8342382-card-table-instead-of-dcq
>  - * optimized RISCV gen_write_ref_array_post_barrier() implementation 
> contributed by @RealFYang
>  - * fix card table verification crashes: in the first refinement phase, when 
> switching the global card tables, we need to re-check whether we are still in 
> the same sweep epoch or not. It might have changed due to a GC interrupting 
> acquiring the Heap_lock. Otherwise new threads will scribble on the 
> refinement table.
>Cause are last-minute changes before making the PR ready to review.
>
>  Testing: without the patch, occurs fairly frequently when continuously
>(1 in 20) starting refinement. Does not afterward.
>  - * ayang review 3
>  * comments
>  * minor refactorings
>  - * iwalulya review
>  * renaming
>  * fix some includes, forward declaration
>  - * fix whitespace
>* additional whitespace between log tags
>* rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the 
> other similar rename
>  - ayang review
>  * renamings
>  * refactorings
>  - iwalulya review
>  * comments for variables tracking to-collection-set and just dirtied 
> cards after GC/refinement
>  * predicate for determining whether the refinement has been disabled
>  * some other typos/comment improvements
>  * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with 
> naming
>  - * ayang review - fix comment
>  - * iwalulya review 2
>  * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState
>  * some additional documentation
>  - ... and 14 more: https://git.openjdk.org/jdk/compare/53a66058...aec95051

src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 217:

> 215: 
> 216:   {
> 217: SuspendibleThreadSetLeaver sts_leave;

Can you add some comment on why leaving the set is required? It's not obvious 
to me why. I'd expect handshake to work out of the box...

src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 263:

> 261: 
> 262: SuspendibleThreadSetL

Re: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v17]

2025-03-12 Thread Thomas Schatzl
> Hi all,
> 
>   please review this change that implements (currently Draft) JEP: G1: 
> Improve Application Throughput with a More Efficient Write-Barrier.
> 
> The reason for posting this early is that this is a large change, and the JEP 
> process is already taking very long with no end in sight but we would like to 
> have this ready by JDK 25.
> 
> ### Current situation
> 
> With this change, G1 will reduce the post write barrier to much more resemble 
> Parallel GC's as described in the JEP. The reason is that G1 lacks in 
> throughput compared to Parallel/Serial GC due to larger barrier.
> 
> The main reason for the current barrier is how g1 implements concurrent 
> refinement:
> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers 
> (dirty card queues - dcq) containing the location of dirtied cards. 
> Refinement threads pick up their contents to re-refine. The barrier needs to 
> enqueue card locations.
> * For correctness dirty card updates requires fine-grained synchronization 
> between mutator and refinement threads,
> * Finally there is generic code to avoid dirtying cards altogether (filters), 
> to avoid executing the synchronization and the enqueuing as much as possible.
> 
> These tasks require the current barrier to look as follows for an assignment 
> `x.a = y` in pseudo code:
> 
> 
> // Filtering
> if (region(@x.a) == region(y)) goto done; // same region check
> if (y == null) goto done; // null value check
> if (card(@x.a) == young_card) goto done;  // write to young gen check
> StoreLoad;// synchronize
> if (card(@x.a) == dirty_card) goto done;
> 
> *card(@x.a) = dirty
> 
> // Card tracking
> enqueue(card-address(@x.a)) into thread-local-dcq;
> if (thread-local-dcq is not full) goto done;
> 
> call runtime to move thread-local-dcq into dcqs
> 
> done:
> 
> 
> Overall this post-write barrier alone is in the range of 40-50 total 
> instructions, compared to three or four(!) for parallel and serial gc.
> 
> The large size of the inlined barrier not only has a large code footprint, 
> but also prevents some compiler optimizations like loop unrolling or inlining.
> 
> There are several papers showing that this barrier alone can decrease 
> throughput by 10-20% 
> ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is 
> corroborated by some benchmarks (see links).
> 
> The main idea for this change is to not use fine-grained synchronization 
> between refinement and mutator threads, but coarse grained based on 
> atomically switching card tables. Mutators only work on the "primary" card 
> table, refinement threads on a se...

Thomas Schatzl has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains 24 additional commits 
since the last revision:

 - Merge branch 'master' into 8342382-card-table-instead-of-dcq
 - * optimized RISCV gen_write_ref_array_post_barrier() implementation 
contributed by @RealFYang
 - * fix card table verification crashes: in the first refinement phase, when 
switching the global card tables, we need to re-check whether we are still in 
the same sweep epoch or not. It might have changed due to a GC interrupting 
acquiring the Heap_lock. Otherwise new threads will scribble on the refinement 
table.
   Cause are last-minute changes before making the PR ready to review.
   
 Testing: without the patch, occurs fairly frequently when continuously
   (1 in 20) starting refinement. Does not afterward.
 - * ayang review 3
 * comments
 * minor refactorings
 - * iwalulya review
 * renaming
 * fix some includes, forward declaration
 - * fix whitespace
   * additional whitespace between log tags
   * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other 
similar rename
 - ayang review
 * renamings
 * refactorings
 - iwalulya review
 * comments for variables tracking to-collection-set and just dirtied cards 
after GC/refinement
 * predicate for determining whether the refinement has been disabled
 * some other typos/comment improvements
 * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming
 - * ayang review - fix comment
 - * iwalulya review 2
 * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState
 * some additional documentation
 - ... and 14 more: https://git.openjdk.org/jdk/compare/f77fa17b...aec95051

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/23739/files
  - new: https://git.openjdk.org/jdk/pull/23739/files/758fac01..aec95051

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=16
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=15-16

  Stats: 78123 lines in 1539 files changed: 36243 ins; 29177 del; 12703 mod
  Patch: https://git.openjdk.org/jdk/pull/23739.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/he

Re: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v14]

2025-03-12 Thread Albert Mingkun Yang
On Fri, 7 Mar 2025 13:14:02 GMT, Albert Mingkun Yang  wrote:

>> Thomas Schatzl has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   * iwalulya review
>> * renaming
>> * fix some includes, forward declaration
>
> src/hotspot/share/gc/g1/g1CardTable.hpp line 76:
> 
>> 74: g1_card_already_scanned = 0x1,
>> 75: g1_to_cset_card = 0x2,
>> 76: g1_from_remset_card = 0x4
> 
> Could you outline the motivation for this more precise info? Is it for 
> optimization or essentially for correctness?

OK, it's for better performance, not correctness. How much is the improvement? 
As I understand it, this more precise info is largely independent of the new 
barrier logic. I wonder if it makes sense to extract this out to its own ticket 
to better assess its impact on perf and impl complexity.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1991375754


Re: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v17]

2025-03-12 Thread Thomas Schatzl
On Wed, 12 Mar 2025 13:20:25 GMT, Albert Mingkun Yang  wrote:

>> Thomas Schatzl has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains 24 additional 
>> commits since the last revision:
>> 
>>  - Merge branch 'master' into 8342382-card-table-instead-of-dcq
>>  - * optimized RISCV gen_write_ref_array_post_barrier() implementation 
>> contributed by @RealFYang
>>  - * fix card table verification crashes: in the first refinement phase, 
>> when switching the global card tables, we need to re-check whether we are 
>> still in the same sweep epoch or not. It might have changed due to a GC 
>> interrupting acquiring the Heap_lock. Otherwise new threads will scribble on 
>> the refinement table.
>>Cause are last-minute changes before making the PR ready to review.
>>
>>  Testing: without the patch, occurs fairly frequently when continuously
>>(1 in 20) starting refinement. Does not afterward.
>>  - * ayang review 3
>>  * comments
>>  * minor refactorings
>>  - * iwalulya review
>>  * renaming
>>  * fix some includes, forward declaration
>>  - * fix whitespace
>>* additional whitespace between log tags
>>* rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the 
>> other similar rename
>>  - ayang review
>>  * renamings
>>  * refactorings
>>  - iwalulya review
>>  * comments for variables tracking to-collection-set and just dirtied 
>> cards after GC/refinement
>>  * predicate for determining whether the refinement has been disabled
>>  * some other typos/comment improvements
>>  * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with 
>> naming
>>  - * ayang review - fix comment
>>  - * iwalulya review 2
>>  * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState
>>  * some additional documentation
>>  - ... and 14 more: https://git.openjdk.org/jdk/compare/0c7b5abb...aec95051
>
> src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 217:
> 
>> 215: 
>> 216:   {
>> 217: SuspendibleThreadSetLeaver sts_leave;
> 
> Can you add some comment on why leaving the set is required? It's not obvious 
> to me why. I'd expect handshake to work out of the box...

It isn't apparently. Removed.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1991999476


Re: RFR: 8351435: Change JLine Console implementation back to opt-in [v2]

2025-03-12 Thread Naoto Sato
On Wed, 12 Mar 2025 07:13:19 GMT, Jaikiran Pai  wrote:

>> Naoto Sato has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Reflects review comment
>
> test/jdk/java/io/Console/ConsolePromptTest.java line 77:
> 
>> 75: var expect = Paths.get("/usr/bin/expect");
>> 76: if (!Files.exists(expect) || !Files.isExecutable(expect)) {
>> 77: System.out.println("'expect' command not found. Test 
>> ignored.");
> 
> Hello Naoto, I think throwing a `jtreg.SkippedException` might be better here 
> so that it's clear that the test was skipped. There have been recent 
> reporting improvements too which make it easier to notice such skipped tests.

Good point. Modified the piece (w/ some other minor changes)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/23993#discussion_r1991925846


Re: RFR: 8336881: [Linux] Support for hierarchical limits for Metrics [v16]

2025-03-12 Thread Severin Gehwolf
On Thu, 20 Feb 2025 14:22:20 GMT, Severin Gehwolf  wrote:

>> Please review this fix for cgroups-based metrics reporting in the 
>> `jdk.internal.platform` package. This fix is supposed to address wrong 
>> reporting of certain limits if the limits aren't set at the leaf nodes.
>> 
>> For example, on cg v2, the memory limit interface file is `memory.max`. 
>> Consider a cgroup path of  `/a/b/c/d`. The current code only reports the 
>> limits (via Metrics) correctly if it's set at `/a/b/c/d/memory.max`. 
>> However, some systems - like a systemd slice - sets those limits further up 
>> the hierarchy. For example at `/a/b/c/memory.max`. `/a/b/c/d/memory.max` 
>> might be set to the value `max` (for unlimited), yet `/a/b/c/memory.max` 
>> would report the actual limit value (e.g. `1048576000`).
>> 
>> This patch addresses this issue by:
>> 
>> 1. Refactoring the interface lookup code to relevant controllers for 
>> cpu/memory. The CgroupSubsystem classes then delegate to those for the 
>> lookup. This facilitates having an API for the lookup of an updated limit in 
>> step 2.
>> 2. Walking the full hierarchy of the cgroup path (if any), looking for a 
>> lower limit than at the leaf. Note that it's not possible to raise the limit 
>> set at a path closer to the root via the interface file at a 
>> further-to-the-leaf-level. The odd case out seems to be `max` values on some 
>> systems (which seems to be the default value).
>> 
>> As an optimization this hierarchy walk is skipped on containerized systems 
>> (like K8S), where the limits are set in interface files at the leaf nodes of 
>> the hierarchy. Therefore there should be no change on those systems.
>> 
>> This patch depends on the Hotspot change implementing the same for the JVM 
>> so that `Metrics.isContainerized()` works correctly on affected systems 
>> where `-XshowSettings:system` currently reports `System not containerized` 
>> due to the missing JVM fix. A test framework for such hierarchical systems 
>> has been added in 
>> [JDK-8333446](https://bugs.openjdk.org/browse/JDK-8333446). This patch adds 
>> a test using that framework among some simpler unit tests.
>> 
>> Thoughts?
>> 
>> Testing:
>> 
>> - [x] GHA
>> - [x] Container tests on Linux x86_64 on cg v1 and cg v2 systems
>> - [x] Some manual testing using systemd slices
>
> Severin Gehwolf has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains 42 commits:
> 
>  - Merge branch 'master' into jdk-8336881-metrics-systemd-slice
>  - Merge branch 'master' into jdk-8336881-metrics-systemd-slice
>  - Fix missing imports
>  - Merge branch 'master' into jdk-8336881-metrics-systemd-slice
>  - Merge branch 'master' into jdk-8336881-metrics-systemd-slice
>  - Merge branch 'master' into jdk-8336881-metrics-systemd-slice
>  - Merge branch 'master' into jdk-8336881-metrics-systemd-slice
>  - Merge branch 'master' into jdk-8336881-metrics-systemd-slice
>  - Add exclusive access dirs directive for systemd tests
>  - Merge branch 'master' into jdk-8336881-metrics-systemd-slice
>  - ... and 32 more: https://git.openjdk.org/jdk/compare/e1d0a9c8...74abae5b

Keep open, please.

-

PR Comment: https://git.openjdk.org/jdk/pull/20280#issuecomment-2718491360


Re: RFR: 8351435: Change JLine Console implementation back to opt-in

2025-03-12 Thread Jaikiran Pai
On Tue, 11 Mar 2025 18:32:10 GMT, Naoto Sato  wrote:

> JDK has been using JLine based Console implementation, in JDK20 as opt-in, 
> then in JDK22 as the default. While it has been providing rich functionality 
> for Console, it is increasingly difficult to maintain as a Console 
> implementation. In light of the on-ramp feature 
> (https://bugs.openjdk.org/browse/JDK-8344699), which proposes switching 
> `java.lang.IO` class to use `System.in` and `System.out` instead of Console, 
> reverting the default Console implementation to JDK's built-in one in the 
> java.base module is considered desirable. Some tests are modified along with 
> this change, among them test/jdk/java/io/Console/ConsolePromptTest.java 
> changes were contributed by @lahodaj (thanks!)

test/jdk/java/io/Console/ConsolePromptTest.java line 77:

> 75: var expect = Paths.get("/usr/bin/expect");
> 76: if (!Files.exists(expect) || !Files.isExecutable(expect)) {
> 77: System.out.println("'expect' command not found. Test 
> ignored.");

Hello Naoto, I think throwing a `jtreg.SkippedException` might be better here 
so that it's clear that the test was skipped. There have been recent reporting 
improvements too which make it easier to notice such skipped tests.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/23993#discussion_r1990775475


Re: RFR: 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset [v2]

2025-03-12 Thread Jaikiran Pai
On Tue, 11 Mar 2025 15:31:09 GMT, Eirik Bjørsnøs  wrote:

> This seems mostly used to determine which ZipCoder to pick. Could we fold 
> this into the method, calling it zipCoderForPos and make it just return the 
> ZipCoder, like we had in Source?

I intentionally removed the `zipCoderForPos()` method and instead introduced 
this static method to avoid the case where the code ends up calling this method 
and then stores the returned `ZipCoder` (like was being done in `Source`). 
Instead with this new method, it now allows the call sites to determine if a 
UTF8 `ZipCoder` is needed or a non-UTF8 one and then decide where to get the 
non-UTF8 `ZipCoder` from. If the call site is a instance method in `ZipFile`, 
then it was use the `ZipFile`'s `zipCoder` field and if the call site is within 
`Source`, when the `Source` is being instantiated then it constructs a non-UTF8 
`ZipCoder` afresh. That level of detail is better dealt at the call site 
instead of a method like `zipCoderForPos()`.

> src/java.base/share/classes/java/util/zip/ZipFile.java line 1744:
> 
>> 1742: 
>> 1743: int entryPos = pos + CENHDR;
>> 1744: final ZipCoder entryZipCoder;
> 
> If you want to construct this lazily, then I think a comment documenting that 
> purpose would be useful here.

Done.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r1990711952
PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r1990713465


Re: RFR: 8351897: Extra closing curly brace typos in Javadoc

2025-03-12 Thread Iris Clark
On Thu, 13 Mar 2025 01:21:54 GMT, Archie Cobbs  wrote:

> This PR fixes minor Javadoc typos in a few `java.base` classes. Extra curly 
> braces have been sneaking in.

src/java.base/share/classes/java/text/MessageFormat.java line 327:

> 325:  *   {@code unit}
> 326:  *   {@link ListFormat#getInstance(Locale, ListFormat.Type, 
> ListFormat.Style)
> 327:  *   ListFormat.getInstance}{@code (getLocale()}, {@link 
> ListFormat.Type#UNIT}, {@link ListFormat.Style#FULL})

The new ')' at the end of the line.  Where's the matching '('?  I'm guessing 
that the IDE thinks it's the one immediately before the `getLocale()`, but I 
suspect that javadoc won't agree with that.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24022#discussion_r1992525288


Re: RFR: 8351897: Extra closing curly brace typos in Javadoc

2025-03-12 Thread Naoto Sato
On Thu, 13 Mar 2025 01:21:54 GMT, Archie Cobbs  wrote:

> This PR fixes minor Javadoc typos in a few `java.base` classes. Extra curly 
> braces have been sneaking in.

The change in `MessageFormat.java` looks good

-

PR Review: https://git.openjdk.org/jdk/pull/24022#pullrequestreview-2680224832


Re: RFR: 8351897: Extra closing curly brace typos in Javadoc

2025-03-12 Thread Archie Cobbs
On Thu, 13 Mar 2025 01:46:53 GMT, Iris Clark  wrote:

> Where's the matching `(`?  I'm guessing that the IDE thinks it's the one 
> immediately before the `getLocale()`, but I suspect that javadoc won't agree 
> with that.

You are correct - the matching `(` is the one preceding `getLocale()`.

I'm not sure what you mean by "I suspect that javadoc won't agree with that". 
This change is simply making this row of the table consistent with the ones 
above it. You can see how they currently appear 
[here](https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/text/MessageFormat.html).
 In this row, the closing `}` should have been a `)`.

Thanks for taking a look.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24022#discussion_r1992531755


Re: RFR: 8345687: Improve the implementation of SegmentFactories::allocateSegment [v7]

2025-03-12 Thread Quan Anh Mai
> Hi,
> 
> This patch improves the performance of a typical `Arena::allocate` in several 
> ways:
> 
> - Delay the creation of the NativeMemorySegmentImpl. This avoids the merge of 
> the instance with the one obtained from the call in the uncommon path, 
> increasing the chance the object being scalar replaced.
> - Split the allocation of over-aligned memory to a slow-path method.
> - Align the memory to 8 bytes, allowing faster zeroing.
> - Use a dedicated method to zero the just-allocated native memory, reduce 
> code size and make it more straightforward.
> - Make `VM.pageAlignDirectMemory` a `Boolean` instead of a `boolean` so that 
> `false` value can be constant folded.
> 
> Please take a look and leave your reviews, thanks a lot.

Quan Anh Mai has updated the pull request incrementally with one additional 
commit since the last revision:

  expand wildcard imports

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/22610/files
  - new: https://git.openjdk.org/jdk/pull/22610/files/e82cc493..dba7ae14

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=22610&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22610&range=05-06

  Stats: 17 lines in 1 file changed: 14 ins; 2 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/22610.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/22610/head:pull/22610

PR: https://git.openjdk.org/jdk/pull/22610


RFR: 8351897: Extra closing curly brace typos in Javadoc

2025-03-12 Thread Archie Cobbs
This PR fixes minor Javadoc typos in a few `java.base` classes. Extra curly 
braces have been sneaking in.

-

Commit messages:
 - Fix minor Javadoc typos.

Changes: https://git.openjdk.org/jdk/pull/24022/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24022&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8351897
  Stats: 12 lines in 5 files changed: 0 ins; 0 del; 12 mod
  Patch: https://git.openjdk.org/jdk/pull/24022.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24022/head:pull/24022

PR: https://git.openjdk.org/jdk/pull/24022