On Wed, 14 May 2025 12:47:08 GMT, Stefan Karlsson <stef...@openjdk.org> wrote:

>> While working on improving the TLAB sizing code for ZGC @kstefanj ran into 
>> an issue with the following tests failing:
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorInterpreterObjectTest.java
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java
>> 
>> The reason for seeing the problems now is that with the new sizing code ZGC 
>> used smaller TLABs at first, before resizing them to a proper size (to lower 
>> the waste). In the HeapMonitor tests we don't allocate enough to trigger GCs 
>> that will resize the TLABs so most of the tests will now run with small 
>> TLABs. This should not be a problem, but it turns out the current sampling 
>> code is not working properly when you get a lot of outside TLAB allocations. 
>> You get those when trying to allocate a fairly large object (~1400B) that 
>> won't fit into the TLAB, but there are still quite a bit of room in the TLAB 
>> so we don't want to throw it away and take a new one.
>> 
>> The problem in the current code is that we keep track of when to sample with 
>> multiple variables and when getting out of TLAB allocations these get out of 
>> sync.
>> 
>> The proposed patch is the result of a restructuring and fixing of the the 
>> code that me and @kstefanj did to fix this issue.
>> 
>> The idea is to better account how much we have allocated in three different 
>> buckets:
>> * Outside of TLAB allocations
>> * Accounted TLAB allocations
>> * The last bit of TLAB allocations that hasn't been accounted yet
>> 
>> And then use the sum of that to compare against a *non-changing* threshold 
>> to see if it is time to take a sample.
>> 
>> There are a few things to think about when reading this code:
>> * The thread can allocate and retire multiple TLABs before we cross the 
>> sample threshold.
>> * The sampling can take multiple samples in a single TLAB
>> * Any overshooting of the sample threshold triggers only one sample and the 
>> extra bytes are ignored when checking for the next sample.
>> 
>> There are some restructuring in the patch to confine the ThreadHeapSampler 
>> variables and code. For example:
>> 
>> 1) Moved accounting variables out of TLAB and into the ThreadHeapSampler
>> 2) Moved thread allocation accounting and sampling code out of the TLAB
>> 3) Moved retiring of TLABs out of make_parseable (needed to support (2))
>> 
>> Some of that could be extracted into a separate PR if that's deemed 
>> worthwhile.
>> 
>> Tested with the HeapMonitor tests various TLAB sizes.
>> 
>> If there are anyone using these APIs it would be nice to get feedba...
>
> Stefan Karlsson has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Apply suggestions from code review
>   
>   Co-authored-by: Stefan Johansson 
> <54407259+kstef...@users.noreply.github.com>

src/hotspot/share/runtime/threadHeapSampler.cpp line 399:

> 397:   assert(result > 0 && result < static_cast<double>(SIZE_MAX), "Result 
> is not in an acceptable range.");
> 398:   size_t interval = static_cast<size_t>(result);
> 399:   _sample_threshold = interval;

Nit: The line 399 is not needed as the value is reset at 400.

src/hotspot/share/runtime/threadHeapSampler.hpp line 89:

> 87: 
> 88:     // Call this after _rnd is initialized to initialize 
> _bytes_until_sample.
> 89:     pick_next_sample();

The old identifier is used: _bytes_until_sample.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25114#discussion_r2093526227
PR Review Comment: https://git.openjdk.org/jdk/pull/25114#discussion_r2093523489

Reply via email to