Re: RFR: 8337217: Port VirtualMemoryTracker to use VMATree

Johan Sjölen Fri, 08 Nov 2024 02:57:59 -0800

On Fri, 27 Sep 2024 09:52:07 GMT, Afshin Zafari <azaf...@openjdk.org> wrote:


> > What is the consensus on having two implementations alive at the same time? 
> > I'd like to see us delete the old VirtualMemoryTracker and only have the 
> > tree implementation left as part of this PR. What is the status of our 
> > testing? Is the tree version equivalent now and passes all tests?
> 
> I have already told that the main reason to have 2 versions is to be able to 
> switch back from new to old version at customer site at runtime. This is what 
> Thomas has requested.

The issue with having two implementations is that it requires adding a new 
`java` option, and deprecating and removing those take 3 releases. That's 18 
months of us being required to have the old version in the codebase, and being 
required to maintain it. I don't think that's a negligible promise to make.

Rather than having 2 implementations, I'd like to see us aiming for integration 
for JDK-25 after forking 24, so integration in December. That would give us 6 
months of ensuring stability of the new implementation before it reaches any 
customers. During those 6 months it will go through all of our testing multiple 
times over, and be used by OpenJDK developers. What do you think of this idea?

> During the implementation, it is much beneficial to have access to the old 
> version to compare the results and debug problems when they occur. Until the 
> whole port to VMATree is not done, we need this 2 versions side-by-side 
> feature.

Sure, I can understand that it's nice to have both versions present during 
development. Right now it seems like we have a broken build, do you have any 
plans on having a functioning and fully featured build soon?

>> src/hotspot/share/nmt/regionsTree.hpp line 31:
>> 
>>> 29: #include "nmt/vmatree.hpp"
>>> 30: 
>>> 31: class RegionsTree : public VMATree {
>> 
>> Is it necessary for this to be a superclass? Seems like this class contains 
>> utilities for working with a `VMATree`. Could it instead be `AllStatic` and 
>> take a tree as its first argument? I'd like to avoid inheritance unless 
>> necessary (typically for the usage of virtual functions).
>
> It is inherited because it is not a new type. I think that 
> `_tree->visit_reserved_region(par1, par2)` is more readable than 
> `VMATree::visit_reserved_region(_tree, par1, par2)`
> I could not find any *virtual functions* in these classes, what do you mean 
> by that?

I'm saying that inheritance is mostly useful when we have virtual functions 
which we can specialize, and that `VMATree` has none.

>> src/hotspot/share/nmt/regionsTree.hpp line 33:
>> 
>>> 31: class RegionsTree : public VMATree {
>>> 32:  private:
>>> 33:   NativeCallStackStorage* _ncs_storage;
>> 
>> No need for this to be a pointer.
>
> How to use the ctor with a `bool` arg, then?

The bool argument is just passed along.

```c++
  RegionsTree(bool with_storage) : VMATree(), _ncs_storage(with_storage) {
  }

>> src/hotspot/share/nmt/regionsTree.hpp line 108:
>> 
>>> 106:             CommittedMemoryRegion cmr((address)base, comm_size, st);
>>> 107:             comm_size = 0;
>>> 108:             prev.clear_node();
>> 
>> This is unneeded.
>
> Which part? Why?
> `clear_node()` is the same as `prev = nullptr` if pointers were used.
> `is_valid()` is the same as `prev == nullptr` if pointers were used.

Because you immediately reassign `prev` to `curr` in L114 it's not necessary to 
first call `clear_node`.

>> src/hotspot/share/nmt/virtualMemoryTrackerWithTree.cpp line 101:
>> 
>>> 99: }
>>> 100: 
>>> 101: void 
>>> VirtualMemoryTrackerWithTree::apply_summary_diff(VMATree::SummaryDiff diff) 
>>> {
>> 
>> Applying a summary diff in the MemoryFileTracker is this:
>> 
>> ```c++
>>   for (int i = 0; i < mt_number_of_types; i++) {
>>     VirtualMemory* summary = 
>> file->_summary.by_type(NMTUtil::index_to_flag(i));
>>     summary->reserve_memory(diff.flag[i].commit);
>>     summary->commit_memory(diff.flag[i].commit);
>>   }
>> 
>> 
>> This is much simpler and doesn't require looking at signs and so on.
>
> In `MemoryFileTracker`, the changes in commit/reserve are applied to a local 
> `VirtualMemorySnapshot`. Here we have to apply them to the global 
> `VirtualMemorySummary`.

Yeah, that doesn't seem like a problem.

```c++
  for (int i = 0; i < mt_number_of_types; i++) {
    r = diff.flag[i].reserve;
    c = diff.flag[i].commit;
    flag = NMTUtil::index_to_flag(i);
    VirtualMemory* mem = VirtualMemorySummary::as_snapshot()->by_type(flag);
    reserved = mem->reserved();
    committed = mem->committed();
    mem->reserve_memory(r);
    mem->commit_memory(c);
    if ((size_t)-r > reserved) {
      print_err("release");
    }
    if ((size_t)-c > reserved || (size_t)c > committed) {
      print_err("uncommit");
    }
  }

>> src/hotspot/share/nmt/virtualMemoryTrackerWithTree.hpp line 34:
>> 
>>> 32: #include "utilities/ostream.hpp"
>>> 33: 
>>> 34: class VirtualMemoryTrackerWithTree : AllStatic {
>> 
>> Can this class be redone to use the same pattern as `MemoryFileTracker` with 
>> an outer class that's not static and an inner class that's stack, storing an 
>> instance of the outer class? That makes testing the class by creating an 
>> instance of it possible, unlike our old tests.
>
> Can we do this change as a separate RFE? It impacts the code a lot.

Then just invert it: Have the outer class be static and the inner class be an 
instance. We can change the `MemoryFileTracker` to be that, as it's not as 
large of a change.

>> src/hotspot/share/nmt/vmatree.hpp line 59:
>> 
>>> 57:   // Bit fields view: bit 0 for Reserved, bit 1 for Committed.
>>> 58:   // Setting a region as Committed preserves the Reserved state.
>>> 59:   enum class StateType : uint8_t { Reserved = 1, Committed = 3, 
>>> Released = 0, COUNT = 4 };
>> 
>> Why do we now consider `StateType` to be a bit field?
>
> As the comments above it says, and we already had a discussion on it: The 
> 'Committed' state now means 'Reserved and Committed'. So, when we visit nodes 
> and check for Reserved ones, the committed nodes are also considered as 
> Reserved. Otherwise, we would ignore the Committed nodes and continue looking 
> for Reserved nodes.

Right, I don't remember the discussion on this being a bit field. I agree, 
though, Committed also means Reserved, but it's not necessary to express that 
as a bit field. Regardless, I leave it up to you how we express this.

>> test/hotspot/gtest/nmt/test_nmt_treap.cpp line 168:
>> 
>>> 166:     tty->print_cr("d1: %lf, d2: %lf, d2/d1: %lf", d1, d2, d2 / d1);
>>> 167:     EXPECT_LT((int)(d2 / d1), N2 / N1);
>>> 168:   }
>> 
>> 1000 and 10_000 elements are both quite small, isn't one measurement prone 
>> to being slower than expected due to unforeseen circumstances? For example, 
>> if `malloc` performs a heavy allocation (by mapping in a few new  pages) 
>> during the 10_000 elements insertion then that may stall enough that the 
>> `EXPECT_LT` fails. We are also bound to only performing the test once.
>> 
>> Is your thinking that these perf tests should be kept in, or should they be 
>> removed before integrating?
>
> Generally, I think it is necessary to have performance tests to verify if any 
> future changes do not degrade the performance.
> The approach of stress testing can be agreed between us, but the tests have 
> to exist.
> In this approach, the input size is scaled N times and we expect the 
> execution time grows as O(log(N)) which is much less than N.
> This test fails and we need to have a justification for it. If approach is 
> invalid or not correct implemented, we fix it. But the expectations remains 
> the same.

Why would the execution time grow logarithmically when we do linearly more 
work? When we run this with `N2` we will perform `10_000 * log(10_000, 2)` 
units of work, and for `N1` we will perform `1_000 * log(1_000, 2)` units of 
work, approximately. A closer approximation is `O(log(1)) + O(log(2)) + ... + 
O(log(n))` for `n = N2` or `n = N1`.

Let's calculate that:


>>> import math
>>> def time_it(n):
...     t = 0
...     for i in range(1, n):
...             t  = t + math.log(i)
...     return [t, 3*t] # Approximate best and worst case
... 
>>> time_it(1000)
[5905.220423209189, 17715.661269627566]
>>> time_it(10_000)
[82099.71749644217, 246299.15248932652]
>>> def compare(a, b):
...     ab, aw = a
...     bb, bw = b
...     return [ bb / ab, bw / aw ]
... 
>>> compare(time_it(1000), time_it(10_000))
[13.902904821938064, 13.902904821938066]
>>> # Ouch, that's outside of our linear bound!


What I said previously also applies, especially the performance of `malloc` 
will impact this I suspect.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2378929190
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1711105506
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1703735921
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1799085027
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1703751887
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1703739805
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1776820787
PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1703877183

Re: RFR: 8337217: Port VirtualMemoryTracker to use VMATree

Reply via email to