[ 
https://issues.apache.org/jira/browse/FLINK-29985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-29985:
--------------------------------------
    Description: 
When TM is stopped by RM, its slot table is closed, causing all its slots to be 
released.
However, when TM is stopped by SIGTERM (i.e. external resource manager), its 
slot table is NOT closed.
 

When a slot is released, the associated resources are released as well, in 
particular, MemoryManager.
MemoryManager might hold not only memory, but also arbitrary shared resources 
(currently, PythonSharedResources and RocksDBSharedResources).
As of now, RocksDBSharedResources contains only ephemeral resources. Not sure 
about PythonSharedResources, but likely it is associated with a separate 
process.
That means that in standalone clusters, some resources might not be released.

  was:
When a slot is released, the associated resources are released as well, in 
particular, MemoryManager. MemoryManager might hold not only memory, but also 
some arbitrary shared resources (currently, PythonSharedResources and 
RocksDBSharedResources).

When TM is stopped by JManager, its slot table is closed, causing all its slot 
to be released

When TM is stopped by SIGTERM (i.e. external resource manager), its slot table 
is NOT closed.

That means that in standalone clusters, some resources might not be released.

 

As of now, RocksDBSharedResources contains only ephemeral resources.

Not sure about PythonSharedResources, but likely it is associated with a 
separate process.


> TaskManager doesn't close SlotTable on SIGTERM
> ----------------------------------------------
>
>                 Key: FLINK-29985
>                 URL: https://issues.apache.org/jira/browse/FLINK-29985
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Task
>    Affects Versions: 1.16.0, 1.15.3
>            Reporter: Roman Khachatryan
>            Priority: Major
>
> When TM is stopped by RM, its slot table is closed, causing all its slots to 
> be released.
> However, when TM is stopped by SIGTERM (i.e. external resource manager), its 
> slot table is NOT closed.
>  
> When a slot is released, the associated resources are released as well, in 
> particular, MemoryManager.
> MemoryManager might hold not only memory, but also arbitrary shared resources 
> (currently, PythonSharedResources and RocksDBSharedResources).
> As of now, RocksDBSharedResources contains only ephemeral resources. Not sure 
> about PythonSharedResources, but likely it is associated with a separate 
> process.
> That means that in standalone clusters, some resources might not be released.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to