On Thu, 1 Aug 2024 15:44:32 GMT, Afshin Zafari <azaf...@openjdk.org> wrote:
> - `VMATree` is used instead of `SortedLinkList` in new class > `VirtualMemoryTrackerWithTree`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls > easier. > - Both old and new versions exist in the code and can be selected via > `MemTracker::set_version()` > - `find_reserved_region()` is used in 4 cases, it will be removed in further > PRs. > - All tier1 tests pass except one ~that expects a 50% increase in committed > memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. > - Adding a runtime flag for selecting the old or new version can be added > later. > - Some performance tests are added for new version, VMATree and Treap, to > show the idea and should be improved later. Based on the results of comparing > speed of VMATree and VMT, VMATree shows ~40x faster response time. Changes requested by gziemski (Reviewer). Changes requested by gziemski (Reviewer). More review feedback... Changes requested by gziemski (Reviewer). Are we keeping both the linked list based and VMATree based implementations just for the review process, and we will drop the linked list one, once we are done with the review, or are they both going to be checked in? Is there a concrete advantage here for using lambda expression when iterating committed regions? I'm asking because personally I find while ((committed_rgn = itr.next()) != nullptr) { print_committed_rgn(committed_rgn); } simpler and more compact, hence easier to understand, than CommittedMemoryRegion cmr; VirtualMemoryTrackerWithTree::tree()->visit_committed_regions(reserved_rgn, &cmr, [&](CommittedMemoryRegion* crgn) { print_committed_rgn(crgn); return true; }); Same here: if (MemTracker::is_using_tree()) { CommittedMemoryRegion cmr; bool reserved_and_committed = false; VirtualMemoryTrackerWithTree::tree()->visit_committed_regions(reserved_rgn, &cmr, [&](CommittedMemoryRegion* committed_rgn) { if (committed_rgn->size() == reserved_rgn->size() && committed_rgn->call_stack()->equals(*stack)) { // One region spanning the entire reserved region, with the same stack trace. // Don't print this regions because the "reserved and committed" line above // already indicates that the region is committed. reserved_and_committed = true; return false; } return true; }); if (reserved_and_committed) return; } } We are setting `reserved_and_committed` inside lambda, then we need to check it outside. It's messy and harder to follow which `return` belongs to which context. > > Is there a concrete advantage here for using lambda expression when > > iterating committed regions? I'm asking because personally I find > > ```c++ > > while ((committed_rgn = itr.next()) != nullptr) { > > print_committed_rgn(committed_rgn); > > } > > ``` > > > > > > > > > > > > > > > > > > > > > > > > simpler and more compact, hence easier to understand, than > > ```c++ > > CommittedMemoryRegion cmr; > > > > VirtualMemoryTrackerWithTree::tree()->visit_committed_regions(reserved_rgn, > > &cmr, [&](CommittedMemoryRegion* crgn) { > > print_committed_rgn(crgn); > > return true; > > }); > > ``` > > Hi Gerard, I'm the one who prefers passing in a lambda and introduced that > style, sorry :-). > > First off, I think the lambda code should be simplified, it should be: > > ```c++ > VMWT::tree().visit_committed_regions_of(*reserved_rgn, [](const > CommittedMemoryRegion& crgn) { > print_committed_region(crgn)); > return true; > }); > ``` > > I don't think that's too bad, right? The `return true` is a bit unfortunate. > It's for being able to stop early, which I agree that regular iterators do > better by just performing a `break;`. > > I'll go ahead and say one nice thing too: If the body of the lambda is a bit > more complicated, then we can look at the capture list (the `[]` above) to > see what the lambda can alter in the outside scope. With a while-loop, you > really have to read the whole thing. > > **The reason** that I write these kinds of iterators is that they're much > simpler to implement and maintain and, _to me_, STL-style iterators aren't > easier to read, it's about the same level of complexity. > > I'll admit that your style of iterators (which are _not_ STL) is about the > same complexity, though I find it unfortunate that we have to write an entire > class for each iterator. > > With the simplifications I made, is this style of iterator acceptable? hi Johan, Yes, this does look better. I looked at https://www.geeksforgeeks.org/when-to-use-lambda-expressions-instead-of-functions-in-cpp to see when one should consider using lambda and I like one particular scenario - improving readability by implementing the code locally, so one can see exactly what is going on without having to look inside a function. I think this is our use case scenario here. If only we didn't need all those `return` statements, I'd have clearly preferred lambda here in fact. Is the plan to check-in the fix with both paths? Or are we going to remove the linked-list based one after the review? Did you consider using the single lambda in `report_virtual_memory_region()` as I suggested? outputStream* out = output(); const char* scale = current_scale(); auto print_rgn = [&](const VirtualMemoryRegion* rgn, const NativeCallStack* stack, MEMFLAGS mem_type = MEMFLAGS::mtNone, const char* region_type = "committed", bool committed = true) { // Don't report if size is too small if (amount_in_current_scale(rgn->size()) == 0) return; out->cr(); INDENT_BY(committed?8:0, print_virtual_memory_region(region_type, rgn->base(), rgn->size()); if (stack->is_empty()) { out->cr(); } else { if (!committed) { out->print(" for %s", NMTUtil::flag_to_name(mem_type)); } out->print_cr(" from "); INDENT_BY(4, _stackprinter.print_stack(stack);) } ) }; and then: bool all_committed = reserved_rgn->size() == reserved_rgn->committed_size(); const char* region_type = (all_committed ? "reserved and committed" : "reserved"); print_rgn(reserved_rgn, reserved_rgn->call_stack(), reserved_rgn->flag(), region_type, false); and if (MemTracker::is_using_sorted_link_list()) { CommittedRegionIterator itr = reserved_rgn->iterate_committed_regions(); CommittedMemoryRegion* crgn; while ((crgn = itr.next()) != nullptr) { print_rgn(crgn, crgn->call_stack()); } } if (MemTracker::is_using_tree()) { VirtualMemoryTrackerWithTree::Instance::tree()->visit_committed_regions(*reserved_rgn, [&](CommittedMemoryRegion& crgn) { print_rgn(&crgn, crgn.call_stack()); return true; }); } I honestly do not think we should be checking in 2 implementations, unless they are nicely hidden away. I do not like the way we manage them right now with 2 explicit `if` checks, each and every time we need to do something. If you could push the 2 implementations into a factory class, so that instead of: if (is_using_sorted_link_list()) VirtualMemoryTracker::snapshot_thread_stacks(); if (is_using_tree()) VirtualMemoryTrackerWithTree::Instance::snapshot_thread_stacks(); we have just: ` VirtualMemoryTracker::snapshot_thread_stacks(); ` and VirtualMemoryTracker::VirtualMemoryTracker(bool useLinkedList) { if (useLinkedList) { instance = new VirtualMemoryTrackerWithLinkedList() } else { instance = new VirtualMemoryTrackerWithLinkedList() } } VirtualMemoryTracker::snapshot_thread_stacks() { instance.snapshot_thread_stacks(); } I still haven't run the benchmarks to verify the speed increase promise. Ideally, I would like to do this with my mechanism, but will only do it, if I can manage it without getting sucked into working on the mechanism itself. I will definitively want to run the provided microbenchmarks though. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 279: > 277: "Cannot commit verification bitmap > memory"); > 278: } > 279: MemTracker::record_virtual_memory_type(verify_bitmap.base(), > verify_bitmap..size(), mtGC); Does this even compile? src/hotspot/share/nmt/memReporter.cpp line 440: > 438: if (all_committed) { > 439: if (MemTracker::is_using_sorted_link_list()) { > 440: CommittedRegionIterator itr = > reserved_rgn->iterate_committed_regions(); Indentation. src/hotspot/share/nmt/memReporter.cpp line 467: > 465: > 466: if (reserved_and_committed) > 467: return; This looks better, but now that I got more comfortable with anonymous local functions using lambda, I think we can do better. We can write a single `print_rgn` lambda function that we can use in both "reserved" and "committed" cases: outputStream* out = output(); const char* scale = current_scale(); auto print_rgn = [&](const VirtualMemoryRegion* rgn, const NativeCallStack* stack, MEMFLAGS mem_type = MEMFLAGS::mtNone, const char* region_type = "committed", bool committed = true) { // Don't report if size is too small if (amount_in_current_scale(rgn->size()) == 0) return; out->cr(); INDENT_BY(committed?8:0, print_virtual_memory_region(region_type, rgn->base(), rgn->size()); if (stack->is_empty()) { out->cr(); } else { if (!committed) { out->print(" for %s", NMTUtil::flag_to_name(mem_type)); } out->print_cr(" from "); INDENT_BY(4, _stackprinter.print_stack(stack);) } ) }; src/hotspot/share/nmt/memReporter.cpp line 478: > 476: stack = committed_rgn->call_stack(); > 477: out->cr(); > 478: INDENT_BY(8, This entire section looks messy and in need of some cleanup. We have lines that are only relevant to one implementation not the other, used in both. We share `committed_rgn` name between common path and old impl, then use `crgn` in the new impl. May I suggest something like: auto print_committed_rgn = [&](CommittedMemoryRegion* crgn) { // Don't report if size is too small if (amount_in_current_scale(crgn->size()) == 0) return; stack = crgn->call_stack(); out->cr(); INDENT_BY(8, print_virtual_memory_region("committed", crgn->base(), crgn->size()); if (stack->is_empty()) { out->cr(); } else { out->print_cr(" from"); INDENT_BY(4, stack->print_on(out);) } ) }; if (MemTracker::is_using_sorted_link_list()) { CommittedRegionIterator itr = reserved_rgn->iterate_committed_regions(); CommittedMemoryRegion* crgn; while ((crgn = itr.next()) != nullptr) { print_committed_rgn(crgn); } } if (MemTracker::is_using_tree()) { CommittedMemoryRegion cmr; VirtualMemoryTrackerWithTree::tree()->visit_committed_regions(reserved_rgn, &cmr, [&](CommittedMemoryRegion* crgn) { print_committed_rgn(crgn); return true; }); } src/hotspot/share/nmt/memReporter.cpp line 482: > 480: } else { > 481: out->print_cr(" from"); > 482: INDENT_BY(4, stack->print_on(out);) Why aren't we using `_stackprinter` here to print, like in all the other places? ` INDENT_BY(4, _stackprinter.print_stack(stack);) ` src/hotspot/share/nmt/memTracker.hpp line 67: > 65: if (is_using_tree()) > 66: VirtualMemoryTrackerWithTree::Instance::snapshot_thread_stacks(); > 67: } I'd prefer to see just: static void snapshot_thread_stacks() { VirtualMemoryTracker::snapshot_thread_stacks(); } and: static void VirtualMemoryTracker::snapshot_thread_stacks() { if (is_using_sorted_link_list()) VirtualMemoryTrackerWithLinkedList::Instance::snapshot_thread_stacks(); if (is_using_tree()) VirtualMemoryTrackerWithTree::Instance::snapshot_thread_stacks(); } Basically hide the two implementations and then we don't need to expose them. All the APIs such as `is_using_tree()` would not need to be public, but just implementation detail. If we wanted to go one step further, we could create an instance of either one at the init time, then keep using that instance, instead of checking each time we want to do something. src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 94: > 92: if (si._stack_index < 0 || si._stack_index >= _stacks.length()) { > 93: return _fake_stack; > 94: } Is that a leftover from debugging? Shouldn't this be an `assert` in final code? src/hotspot/share/nmt/nmtTreap.hpp line 219: > 217: } > 218: last_seen = node; > 219: return true; Why are we returning a `bool` value in `void` function? src/hotspot/share/nmt/nmtTreap.hpp line 320: > 318: } > 319: head = to_visit.pop(); > 320: if(!f(head)) Needs space `if (!f(head))` src/hotspot/share/nmt/nmtTreap.hpp line 348: > 346: const int cmp_to = COMPARATOR::cmp(head->key(), to); > 347: if (cmp_from >= 0 && cmp_to < 0) { > 348: if(!f(head)) Needs space `if (!f(head))` src/hotspot/share/nmt/nmtTreap.hpp line 356: > 354: head = nullptr; > 355: } > 356: Remove. ------------- PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2228842744 PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2236512021 Changes requested by gziemski (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2246166901 PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2331868978 PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2276641131 PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2276654335 PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2276656880 PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2278278886 PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2307806504 PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2316185120 PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2377524395 PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2377532616 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1777443492 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1710280816 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1717432884 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1710318264 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1715887989 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1722171034 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1722186507 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1777449328 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1722187617 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1722188013 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1722188713