[ 
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675609#comment-17675609
 ] 

Yunhong Zheng commented on FLINK-18356:
---------------------------------------

Hi, [~martijnvisser] and all. [~godfrey]  and I try to find the possible memory 
leak in the table-planner tests.  While the tests running to table-planner 
ITCase which ofter throw OOM error, we got the heap dump several times, we 
found the memory usage of class
{code:java}
MemoryExecutionGraphInfoStore{code}
 can reach more than 50%(Almost every time in table-planner ITCase):

!image-2023-01-11-22-21-57-784.png|width=682,height=191!

 

!image-2023-01-11-22-22-32-124.png|width=736,height=155!

 

The class is used to store execution graph, and it will not be cleared 
immediately after job completed. It's expiration time after job completed is 
set by the option 
{code:java}
jobstore.expiration-time{code}
and this option's default value is 1 min. 

   So, we tried to set a small value for this option to observe whether the OOM 
error will occur. At present, we can only try to do this work in our own Azure, 
which is still on progress. 

> flink-table-planner Exit code 137 returned from process
> -------------------------------------------------------
>
>                 Key: FLINK-18356
>                 URL: https://issues.apache.org/jira/browse/FLINK-18356
>             Project: Flink
>          Issue Type: Bug
>          Components: Build System / Azure Pipelines, Tests
>    Affects Versions: 1.12.0, 1.13.0, 1.14.0, 1.15.0, 1.16.0, 1.17.0
>            Reporter: Piotr Nowojski
>            Priority: Critical
>              Labels: pull-request-available, test-stability
>         Attachments: 1234.jpg, app-profiling_4.gif, 
> image-2023-01-11-22-21-57-784.png, image-2023-01-11-22-22-32-124.png
>
>
> {noformat}
> ============================= test session starts 
> ==============================
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py ..........                    [  
> 1%]
> pyflink/common/tests/test_execution_config.py .......................    [  
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker', 
> arguments 'exec -i -u 1002 
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb 
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to