On Wed, 19 Jul 2023 12:42:27 GMT, Yi Yang <yy...@openjdk.org> wrote:

>> ### Motivation and proposal
>> Hi, heap dump brings about pauses for application's execution(STW), this is 
>> a well-known pain. JDK-8252842 have added parallel support to heapdump in an 
>> attempt to alleviate this issue. However, all concurrent threads 
>> competitively write heap data to the same file, and more memory is required 
>> to maintain the concurrent buffer queue. In experiments, we did not feel a 
>> significant performance improvement from that.
>> 
>> The minor-pause solution, which is presented in this PR, is a two-phase 
>> segmented heap dump:
>> 
>> - Phase 1(STW): Concurrent threads directly write data to multiple heap 
>> files.
>> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump 
>> file. This process can happen outside safepoint.
>> 
>> Now concurrent worker threads are not required to maintain a buffer queue, 
>> which would result in more memory overhead, nor do they need to compete for 
>> locks. The changes in the overall design are as follows:
>> 
>> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
>> <p align="center">Fig1. Before</p>
>> 
>> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
>> <p align="center">Fig2. After this patch</p>
>> 
>> ### Performance evaluation
>> | memory | numOfThread | CompressionMode | STW | Total |
>> | -------| ----------- | --------------- | --- | ---- |
>> | 8g | 1 T | N | 15.612 | 15.612 |
>> | 8g | 32 T | N | 2.561725 | 14.498 |
>> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
>> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
>> | 8g | 96 T | N | 2.6790452 | 14.012 |
>> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
>> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
>> | 16g | 1 T | N | 26.278 | 26.278 |
>> | 16g | 32 T | N | 5.231374 | 26.417 |
>> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
>> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
>> | 16g | 96 T | N | 6.2445556 | 27.141 |
>> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
>> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
>> | 32g | 1 T | N | 48.149 | 48.149 |
>> | 32g | 32 T | N | 10.7734677 | 61.643 |
>> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
>> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
>> | 32g | 96 T | N | 13.1522042 | 61.432 |
>> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
>> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
>> | 64g | 1 T | N | 100.583 | 100.583 |
>> | 64g | 32 T | N | 20.9233744 | 134.701 |
>> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
>> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
>> | 64g | 96 T | N | 26.7374116 | 126.08 |
>> | 64g | ...
>
> Yi Yang has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   test failure on Windows

Hi, a few more points today, but I'm getting there. 8-)

1
The previous parallel dump change:
8252842: Extend jmap to support parallel heap dump
..the notes suggest it was going to have a CSR for the new option, but was then 
decided not to add the option to specify the number of threads.
That CSR was https://bugs.openjdk.org/browse/JDK-8260424 which was withdrawn, 
but looks like we should do one for this change.
If we had a reasonable expectation of what the best number to use was, we could 
do the same thing here.  If we might get that wrong, we will need the user to 
be able to specify.
Actually are we changing the default behaviour: from "parallel if you can", to 
"serial unless you specify".


2
Running the changes, I got a test failure on macos debug (x64 and aarch64):

test/hotspot/jtreg/serviceability/HeapDump/IntegrityHeapDumpTest.java

On different machines, one x64 and one aarch64, got the same kind of error:
----------stderr:(6/609)----------
com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file 
/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/.java_pid36008: target process 
36008 doesn't respond within 10500ms or HotSpot VM not loaded
        at 
jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:95)
        at 
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
        at 
jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
        at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
        at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
        
..so the test fails with: java.lang.RuntimeException: Expected to get exit 
value of [0], exit value is: [1]

Other tests e.g. test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpTest.java 
run OK.

The failing test IntegrityHeapDumpTest.java extends LingeredApp.
test/hotspot/jtreg/serviceability/HeapDump/DuplicateArrayClassesTest.java does 
this, and runs and passes.

I don't yet understand the failure.  Could things have gone wrong in macosx 
debug builds in the attach listener.. I didn't get a stack of the LingeredApp 
under test, but will try again.


3
Docs
https://openjdk.org/guide/#release-notes
This enhancement should probably have a release note, and we could briefly 
explain what parallel means.  
We probably need to still recommend it is not used on a production server 
unless you have tested the impact and are happy it does not disturb the 
application.
Maybe we can offer a recommended usage, a guide number of threads as a starting 
point.
We also need to explain when this can or cannot be used (SerialGC, any other 
restrictions? It's not a problem that SerialGC does not use this, just that we 
should make it clear).  
I can help with the wording if needed, if you'd like to start something (this 
might make it into other docs also.)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1659249670

Reply via email to