On Tue, 1 Aug 2023 05:50:42 GMT, Yi Yang <yy...@openjdk.org> wrote: >> ### Motivation and proposal >> Hi, heap dump brings about pauses for application's execution(STW), this is >> a well-known pain. JDK-8252842 have added parallel support to heapdump in an >> attempt to alleviate this issue. However, all concurrent threads >> competitively write heap data to the same file, and more memory is required >> to maintain the concurrent buffer queue. In experiments, we did not feel a >> significant performance improvement from that. >> >> The minor-pause solution, which is presented in this PR, is a two-phase >> segmented heap dump: >> >> - Phase 1(STW): Concurrent threads directly write data to multiple heap >> files. >> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump >> file. This process can happen outside safepoint. >> >> Now concurrent worker threads are not required to maintain a buffer queue, >> which would result in more memory overhead, nor do they need to compete for >> locks. The changes in the overall design are as follows: >> >>  >> <p align="center">Fig1. Before</p> >> >>  >> <p align="center">Fig2. After this patch</p> >> >> ### Performance evaluation >> | memory | numOfThread | CompressionMode | STW | Total | >> | -------| ----------- | --------------- | --- | ---- | >> | 8g | 1 T | N | 15.612 | 15.612 | >> | 8g | 32 T | N | 2.561725 | 14.498 | >> | 8g | 32 T | C1 | 2.3084878 | 14.198 | >> | 8g | 32 T | C2 | 10.9355128 | 21.882 | >> | 8g | 96 T | N | 2.6790452 | 14.012 | >> | 8g | 96 T | C1 | 2.3044796 | 3.589 | >> | 8g | 96 T | C2 | 9.7585151 | 20.219 | >> | 16g | 1 T | N | 26.278 | 26.278 | >> | 16g | 32 T | N | 5.231374 | 26.417 | >> | 16g | 32 T | C1 | 5.6946983 | 6.538 | >> | 16g | 32 T | C2 | 21.8211105 | 41.133 | >> | 16g | 96 T | N | 6.2445556 | 27.141 | >> | 16g | 96 T | C1 | 4.6007096 | 6.259 | >> | 16g | 96 T | C2 | 19.2965783 | 39.007 | >> | 32g | 1 T | N | 48.149 | 48.149 | >> | 32g | 32 T | N | 10.7734677 | 61.643 | >> | 32g | 32 T | C1 | 10.1642097 | 10.903 | >> | 32g | 32 T | C2 | 43.8407607 | 88.152 | >> | 32g | 96 T | N | 13.1522042 | 61.432 | >> | 32g | 96 T | C1 | 9.0954641 | 9.885 | >> | 32g | 96 T | C2 | 38.9900931 | 80.574 | >> | 64g | 1 T | N | 100.583 | 100.583 | >> | 64g | 32 T | N | 20.9233744 | 134.701 | >> | 64g | 32 T | C1 | 18.5023784 | 19.358 | >> | 64g | 32 T | C2 | 86.4748377 | 172.707 | >> | 64g | 96 T | N | 26.7374116 | 126.08 | >> | 64g | ... > > Yi Yang has updated the pull request incrementally with one additional commit > since the last revision: > > test failure on mac
Thanks for the update - Right, on the CSR I may have followed the links too far and been comparing with jmap features, not jcmd features. 8-) On the default behaviour, running this change, with no -parallel= option, I have been seeing it default to a non-parallel dump, but now I see why: I see num_dump_thread being set (e.g. 12 on my test system), but I see num_active_workers = 1 (requested is 12). I just saw a run with no -parallel option which got num_active_workers = 2, so it did a parallel dump, so the default is being set as intended, but the availability of worker threads is not always predictable! -Xlog:gc shows the GC was using fewer workers than expected. So that suggested -XX:-UseDynamicNumberOfGCThreads and yes, with that I see e.g. num_active_workers = 23. req_num_dump_thread = 12 for my test. And all 12 threads work on the heap dump. ------------------- test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpTest.java: we can't see if parallel dumping is happening Can we add, like in I think another test: run testng/othervm -Xlog:heapdump HeapDumpTest I would also suggest -XX:-UseDynamicNumberOfGCThreads otherwise as above, the GC can scale back and you don't always get a parallel dump when you expect it. ------------------- Do we really want -parallel=-1 ..to be a valid input, and to use the workers, when it's clearly an error. HeapDumpDCmd::execute casts it to a unit so we lose the fact that it was a request for -1. Can we get the signed value in HeapDumpDCmd::execute, print a message and return like we do when compression level is out of range? (Passing a non-number when an integer is expected will cause the jcmd to fail, and that seems correct.) ------------------- So I think I'm done, looks good. 8-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1660469424