On Wed, 10 May 2023 08:29:43 GMT, Yi Yang <yy...@openjdk.org> wrote:

>> Hi, heap dump brings about pauses for application's execution(STW), this is 
>> a well-known pain. JDK-8252842 have added parallel support to heapdump in an 
>> attempt to alleviate this issue. However, all concurrent threads 
>> competitively write heap data to the same file, and more memory is required 
>> to maintain the concurrent buffer queue. In experiments, we did not feel a 
>> significant performance improvement from that.
>> 
>> The minor-pause solution, which is presented in this PR, is a two-stage 
>> segmented heap dump:
>> 
>> 1. Stage One(STW): Concurrent threads directly write data to multiple heap 
>> files.
>> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump 
>> file.
>> 
>> Now concurrent worker threads are not required to maintain a buffer queue, 
>> which would result in more memory overhead, nor do they need to compete for 
>> locks. It significantly reduces 73~80% application pause time. 
>> 
>> | memory | numOfThread | STW         | Total      |
>> | --- | --------- | -------------- | ------------ |
>> | 8g | 1 thread | 15.612 secs | 15.612 secs |
>> | 8g | 32 thread |  2.5617250 secs | 14.498 secs |
>> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | 
>> | 16g | 1 thread | 26.278 secs | 26.278 secs |
>> | 16g | 32 thread |  5.2313740 secs | 26.417 secs |
>> | 16g | 96 thread | 6.2445556 secs | 27.141 secs |
>> | 32g | 1 thread | 48.149 secs | 48.149 secs |
>> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | 
>> | 32g | 96 thread | 13.1522042 secs |  61.432 secs |
>> | 64g | 1 thread |  100.583 secs | 100.583 secs |
>> | 64g | 32 thread | 20.9233744 secs | 134.701 secs | 
>> | 64g | 96 thread | 26.7374116 secs | 126.080 secs | 
>> | 128g | 1 thread | 233.843 secs | 233.843 secs |
>> | 128g | 32 thread | 72.9945768 secs | 207.060 secs |
>> | 128g | 96 thread | 67.6815929 secs | 336.345 secs |
>> 
>>> **Total** means the total heap dump including both two phases
>>> **STW** means the first phase only.
>>> For parallel dump, **Total** = **STW** + **Merge**. For serial dump, 
>>> **Total** = **STW**
>> 
>> ![image](https://user-images.githubusercontent.com/5010047/234534654-6f29a3af-dad5-46bc-830b-7449c80b4dec.png)
>> 
>> In actual testing, two-stage solution can lead to an increase in the overall 
>> time for heapdump(See table above). However, considering the reduction of 
>> STW time, I think it is an acceptable trade-off. Furthermore, there is still 
>> room for optimization in the second merge stage(e.g. 
>> sendfile/splice/copy_file_range instead of read+write combination). Since 
>> number of...
>
> Yi Yang has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   execute VM_HeapDumper directly

Hi, can I have a review for this patch?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1547101136

Reply via email to