> Hi, heap dump brings about pauses for application's execution(STW), this is a > well-known pain. JDK-8252842 have added parallel support to heapdump in an > attempt to alleviate this issue. However, all concurrent threads > competitively write heap data to the same file, and more memory is required > to maintain the concurrent buffer queue. In experiments, we did not feel a > significant performance improvement from that. > > The minor-pause solution, which is presented in this PR, is a two-stage > segmented heap dump: > > 1. Stage One(STW): Concurrent threads directly write data to multiple heap > files. > 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump > file. > > Now concurrent worker threads are not required to maintain a buffer queue, > which would result in more memory overhead, nor do they need to compete for > locks. It significantly reduces 73~80% application pause time. > > | memory | numOfThread | STW | Total | > | --- | --------- | -------------- | ------------ | > | 8g | 1 thread | 15.612 secs | 15.612 secs | > | 8g | 32 thread | 2.5617250 secs | 14.498 secs | > | 8g | 96 thread | 2.6790452 secs | 14.012 secs | > | 16g | 1 thread | 26.278 secs | 26.278 secs | > | 16g | 32 thread | 5.2313740 secs | 26.417 secs | > | 16g | 96 thread | 6.2445556 secs | 27.141 secs | > | 32g | 1 thread | 48.149 secs | 48.149 secs | > | 32g | 32 thread | 10.7734677 secs | 61.643 secs | > | 32g | 96 thread | 13.1522042 secs | 61.432 secs | > | 64g | 1 thread | 100.583 secs | 100.583 secs | > | 64g | 32 thread | 20.9233744 secs | 134.701 secs | > | 64g | 96 thread | 26.7374116 secs | 126.080 secs | > | 128g | 1 thread | 233.843 secs | 233.843 secs | > | 128g | 32 thread | 72.9945768 secs | 207.060 secs | > | 128g | 96 thread | 67.6815929 secs | 336.345 secs | > >> **Total** means the total heap dump including both two phases >> **STW** means the first phase only. >> For parallel dump, **Total** = **STW** + **Merge**. For serial dump, >> **Total** = **STW** > >  > > In actual testing, two-stage solution can lead to an increase in the overall > time for heapdump(See table above). However, considering the reduction of STW > time, I think it is an acceptable trade-off. Furthermore, there is still room > for optimization in the second merge stage(e.g. > sendfile/splice/copy_file_range instead of read+write combination). Since > number of parallel dump thread has a considerable impact on total dump time, > I added a parameter that allows users to specify the number of parallel dump > thread they wish to run. > > ##### Open discussion > > - Pauseless heap dump solution? > An alternative pauseless solution is to fork a child process, set the parent > process heap to read-only, and dump the heap in child process. Once writing > happens in parent process, child process observes them by userfaultfd and > corresponding pages are prioritized for dumping. I'm also looking forward to > hearing comments and discussions about this solution. > > - Client parser support for segmented heap dump > This patch provides a possibility that whether heap dump needs to be complete > or not, can the VM directly generate segmented heapdump, and let the client > parser complete the merge process? Looking forward to hearing comments from > the Eclipse MAT community
Yi Yang has updated the pull request incrementally with two additional commits since the last revision: - max_path check - fix test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13667/files - new: https://git.openjdk.org/jdk/pull/13667/files/620d94dc..00b49e4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=00-01 Stats: 21 lines in 1 file changed: 12 ins; 5 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667 PR: https://git.openjdk.org/jdk/pull/13667