[jira] [Updated] (FLINK-25728) StreamMultipleInputProcessor holds mass CompletableFuture instances under certain scenario

pc wang (Jira) Thu, 20 Jan 2022 03:03:09 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-25728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


pc wang updated FLINK-25728:
----------------------------
    Description: 
We have an application contains a broadcast process stage. The none-broadcast 
input has roughly 10 million messages per seconds, and the broadcast side is 
some kind of control stream, rarelly have message follow through. 

After several hours running, the TaskManager will run out of heap memory and 
restart. We reviewed the application code without finding any relevent issue.

We found that the runing to crash time was roughly the same. Then we make a 
heap dump before  the crash, and found mass `CompletableFuture$UniRun` 
instances. 

These `CompletableFuture$UniRun` instances consumes several GibaByte memorys.

 

The following pic is from the heap dump we get from a mock testing stream with 
same scenario.

!image-2022-01-20-18-43-32-816.png|width=1161,height=471!

 

After some source code research. We found that it might be caused by the 
`StreamMultipleInputProcessor.getAvailableFuture()`.

StreamMultipleInputProcessor has multiple

 

  was:
We have an application contains a broadcast process stage. The none-broadcast 
input has roughly 10 million messages per seconds, and the broadcast side is 
some kind of control stream, rarelly have message follow through. 

After several hours running, the TaskManager will run out of heap memory and 
restart. We reviewed the application code without finding any relevent issue.

We found that the runing to crash time was roughly the same. Then we make a 
heap dump before  the crash, and found mass `CompletableFuture$UniRun` 
instances. 

These `CompletableFuture$UniRun` instances consumes several GibaByte memorys.

 

The following pic is from the heap dump we get from a mock testing stream with 
same scenario.

!image-2022-01-20-18-43-32-816.png|width=1161,height=471!

 

After some source code research. We found that it might be caused by the 
`StreamMultipleInputProcessor.getAvailableFuture()`.

 

 


> StreamMultipleInputProcessor holds mass CompletableFuture instances under 
> certain scenario
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-25728
>                 URL: https://issues.apache.org/jira/browse/FLINK-25728
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream
>    Affects Versions: 1.12.5, 1.13.5, 1.14.2
>            Reporter: pc wang
>            Priority: Major
>         Attachments: image-2022-01-20-18-43-32-816.png
>
>
> We have an application contains a broadcast process stage. The none-broadcast 
> input has roughly 10 million messages per seconds, and the broadcast side is 
> some kind of control stream, rarelly have message follow through. 
> After several hours running, the TaskManager will run out of heap memory and 
> restart. We reviewed the application code without finding any relevent issue.
> We found that the runing to crash time was roughly the same. Then we make a 
> heap dump before  the crash, and found mass `CompletableFuture$UniRun` 
> instances. 
> These `CompletableFuture$UniRun` instances consumes several GibaByte memorys.
>  
> The following pic is from the heap dump we get from a mock testing stream 
> with same scenario.
> !image-2022-01-20-18-43-32-816.png|width=1161,height=471!
>  
> After some source code research. We found that it might be caused by the 
> `StreamMultipleInputProcessor.getAvailableFuture()`.
> StreamMultipleInputProcessor has multiple
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-25728) StreamMultipleInputProcessor holds mass CompletableFuture instances under certain scenario

Reply via email to