On Fri, 27 Sep 2024 09:52:07 GMT, Afshin Zafari <azaf...@openjdk.org> wrote:
> > What is the consensus on having two implementations alive at the same time? > > I'd like to see us delete the old VirtualMemoryTracker and only have the > > tree implementation left as part of this PR. What is the status of our > > testing? Is the tree version equivalent now and passes all tests? > > I have already told that the main reason to have 2 versions is to be able to > switch back from new to old version at customer site at runtime. This is what > Thomas has requested. The issue with having two implementations is that it requires adding a new `java` option, and deprecating and removing those take 3 releases. That's 18 months of us being required to have the old version in the codebase, and being required to maintain it. I don't think that's a negligible promise to make. Rather than having 2 implementations, I'd like to see us aiming for integration for JDK-25 after forking 24, so integration in December. That would give us 6 months of ensuring stability of the new implementation before it reaches any customers. During those 6 months it will go through all of our testing multiple times over, and be used by OpenJDK developers. What do you think of this idea? > During the implementation, it is much beneficial to have access to the old > version to compare the results and debug problems when they occur. Until the > whole port to VMATree is not done, we need this 2 versions side-by-side > feature. Sure, I can understand that it's nice to have both versions present during development. Right now it seems like we have a broken build, do you have any plans on having a functioning and fully featured build soon? >> src/hotspot/share/nmt/regionsTree.hpp line 31: >> >>> 29: #include "nmt/vmatree.hpp" >>> 30: >>> 31: class RegionsTree : public VMATree { >> >> Is it necessary for this to be a superclass? Seems like this class contains >> utilities for working with a `VMATree`. Could it instead be `AllStatic` and >> take a tree as its first argument? I'd like to avoid inheritance unless >> necessary (typically for the usage of virtual functions). > > It is inherited because it is not a new type. I think that > `_tree->visit_reserved_region(par1, par2)` is more readable than > `VMATree::visit_reserved_region(_tree, par1, par2)` > I could not find any *virtual functions* in these classes, what do you mean > by that? I'm saying that inheritance is mostly useful when we have virtual functions which we can specialize, and that `VMATree` has none. >> src/hotspot/share/nmt/regionsTree.hpp line 33: >> >>> 31: class RegionsTree : public VMATree { >>> 32: private: >>> 33: NativeCallStackStorage* _ncs_storage; >> >> No need for this to be a pointer. > > How to use the ctor with a `bool` arg, then? The bool argument is just passed along. ```c++ RegionsTree(bool with_storage) : VMATree(), _ncs_storage(with_storage) { } >> src/hotspot/share/nmt/regionsTree.hpp line 108: >> >>> 106: CommittedMemoryRegion cmr((address)base, comm_size, st); >>> 107: comm_size = 0; >>> 108: prev.clear_node(); >> >> This is unneeded. > > Which part? Why? > `clear_node()` is the same as `prev = nullptr` if pointers were used. > `is_valid()` is the same as `prev == nullptr` if pointers were used. Because you immediately reassign `prev` to `curr` in L114 it's not necessary to first call `clear_node`. >> src/hotspot/share/nmt/virtualMemoryTrackerWithTree.cpp line 101: >> >>> 99: } >>> 100: >>> 101: void >>> VirtualMemoryTrackerWithTree::apply_summary_diff(VMATree::SummaryDiff diff) >>> { >> >> Applying a summary diff in the MemoryFileTracker is this: >> >> ```c++ >> for (int i = 0; i < mt_number_of_types; i++) { >> VirtualMemory* summary = >> file->_summary.by_type(NMTUtil::index_to_flag(i)); >> summary->reserve_memory(diff.flag[i].commit); >> summary->commit_memory(diff.flag[i].commit); >> } >> >> >> This is much simpler and doesn't require looking at signs and so on. > > In `MemoryFileTracker`, the changes in commit/reserve are applied to a local > `VirtualMemorySnapshot`. Here we have to apply them to the global > `VirtualMemorySummary`. Yeah, that doesn't seem like a problem. ```c++ for (int i = 0; i < mt_number_of_types; i++) { r = diff.flag[i].reserve; c = diff.flag[i].commit; flag = NMTUtil::index_to_flag(i); VirtualMemory* mem = VirtualMemorySummary::as_snapshot()->by_type(flag); reserved = mem->reserved(); committed = mem->committed(); mem->reserve_memory(r); mem->commit_memory(c); if ((size_t)-r > reserved) { print_err("release"); } if ((size_t)-c > reserved || (size_t)c > committed) { print_err("uncommit"); } } >> src/hotspot/share/nmt/virtualMemoryTrackerWithTree.hpp line 34: >> >>> 32: #include "utilities/ostream.hpp" >>> 33: >>> 34: class VirtualMemoryTrackerWithTree : AllStatic { >> >> Can this class be redone to use the same pattern as `MemoryFileTracker` with >> an outer class that's not static and an inner class that's stack, storing an >> instance of the outer class? That makes testing the class by creating an >> instance of it possible, unlike our old tests. > > Can we do this change as a separate RFE? It impacts the code a lot. Then just invert it: Have the outer class be static and the inner class be an instance. We can change the `MemoryFileTracker` to be that, as it's not as large of a change. >> src/hotspot/share/nmt/vmatree.hpp line 59: >> >>> 57: // Bit fields view: bit 0 for Reserved, bit 1 for Committed. >>> 58: // Setting a region as Committed preserves the Reserved state. >>> 59: enum class StateType : uint8_t { Reserved = 1, Committed = 3, >>> Released = 0, COUNT = 4 }; >> >> Why do we now consider `StateType` to be a bit field? > > As the comments above it says, and we already had a discussion on it: The > 'Committed' state now means 'Reserved and Committed'. So, when we visit nodes > and check for Reserved ones, the committed nodes are also considered as > Reserved. Otherwise, we would ignore the Committed nodes and continue looking > for Reserved nodes. Right, I don't remember the discussion on this being a bit field. I agree, though, Committed also means Reserved, but it's not necessary to express that as a bit field. Regardless, I leave it up to you how we express this. >> test/hotspot/gtest/nmt/test_nmt_treap.cpp line 168: >> >>> 166: tty->print_cr("d1: %lf, d2: %lf, d2/d1: %lf", d1, d2, d2 / d1); >>> 167: EXPECT_LT((int)(d2 / d1), N2 / N1); >>> 168: } >> >> 1000 and 10_000 elements are both quite small, isn't one measurement prone >> to being slower than expected due to unforeseen circumstances? For example, >> if `malloc` performs a heavy allocation (by mapping in a few new pages) >> during the 10_000 elements insertion then that may stall enough that the >> `EXPECT_LT` fails. We are also bound to only performing the test once. >> >> Is your thinking that these perf tests should be kept in, or should they be >> removed before integrating? > > Generally, I think it is necessary to have performance tests to verify if any > future changes do not degrade the performance. > The approach of stress testing can be agreed between us, but the tests have > to exist. > In this approach, the input size is scaled N times and we expect the > execution time grows as O(log(N)) which is much less than N. > This test fails and we need to have a justification for it. If approach is > invalid or not correct implemented, we fix it. But the expectations remains > the same. Why would the execution time grow logarithmically when we do linearly more work? When we run this with `N2` we will perform `10_000 * log(10_000, 2)` units of work, and for `N1` we will perform `1_000 * log(1_000, 2)` units of work, approximately. A closer approximation is `O(log(1)) + O(log(2)) + ... + O(log(n))` for `n = N2` or `n = N1`. Let's calculate that: >>> import math >>> def time_it(n): ... t = 0 ... for i in range(1, n): ... t = t + math.log(i) ... return [t, 3*t] # Approximate best and worst case ... >>> time_it(1000) [5905.220423209189, 17715.661269627566] >>> time_it(10_000) [82099.71749644217, 246299.15248932652] >>> def compare(a, b): ... ab, aw = a ... bb, bw = b ... return [ bb / ab, bw / aw ] ... >>> compare(time_it(1000), time_it(10_000)) [13.902904821938064, 13.902904821938066] >>> # Ouch, that's outside of our linear bound! What I said previously also applies, especially the performance of `malloc` will impact this I suspect. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2378929190 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1711105506 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1703735921 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1799085027 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1703751887 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1703739805 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1776820787 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1703877183