On Wed, 14 May 2025 12:43:48 GMT, Stefan Karlsson <stef...@openjdk.org> wrote:
>> While working on improving the TLAB sizing code for ZGC @kstefanj ran into >> an issue with the following tests failing: >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorInterpreterObjectTest.java >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java >> >> The reason for seeing the problems now is that with the new sizing code ZGC >> used smaller TLABs at first, before resizing them to a proper size (to lower >> the waste). In the HeapMonitor tests we don't allocate enough to trigger GCs >> that will resize the TLABs so most of the tests will now run with small >> TLABs. This should not be a problem, but it turns out the current sampling >> code is not working properly when you get a lot of outside TLAB allocations. >> You get those when trying to allocate a fairly large object (~1400B) that >> won't fit into the TLAB, but there are still quite a bit of room in the TLAB >> so we don't want to throw it away and take a new one. >> >> The problem in the current code is that we keep track of when to sample with >> multiple variables and when getting out of TLAB allocations these get out of >> sync. >> >> The proposed patch is the result of a restructuring and fixing of the the >> code that me and @kstefanj did to fix this issue. >> >> The idea is to better account how much we have allocated in three different >> buckets: >> * Outside of TLAB allocations >> * Accounted TLAB allocations >> * The last bit of TLAB allocations that hasn't been accounted yet >> >> And then use the sum of that to compare against a *non-changing* threshold >> to see if it is time to take a sample. >> >> There are a few things to think about when reading this code: >> * The thread can allocate and retire multiple TLABs before we cross the >> sample threshold. >> * The sampling can take multiple samples in a single TLAB >> * Any overshooting of the sample threshold triggers only one sample and the >> extra bytes are ignored when checking for the next sample. >> >> There are some restructuring in the patch to confine the ThreadHeapSampler >> variables and code. For example: >> >> 1) Moved accounting variables out of TLAB and into the ThreadHeapSampler >> 2) Moved thread allocation accounting and sampling code out of the TLAB >> 3) Moved retiring of TLABs out of make_parseable (needed to support (2)) >> >> Some of that could be extracted into a separate PR if that's deemed >> worthwhile. >> >> Tested with the HeapMonitor tests various TLAB sizes. >> >> If there are anyone using these APIs it would be nice to get feedba... > > Stefan Karlsson has updated the pull request incrementally with one > additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Stefan Johansson > <54407259+kstef...@users.noreply.github.com> Thanks for reviewing and pair-fixing! ------------- PR Review: https://git.openjdk.org/jdk/pull/25114#pullrequestreview-2840036844