[ https://issues.apache.org/jira/browse/FLINK-35737 ]


    Jon Cavallie Mester deleted comment on FLINK-35737:
    ---------------------------------------------

was (Author: JIRAUSER311063):
Hi, I’d like to take this issue.

I can reproduce the problem locally by enabling
`-Dflink.tests.check-segment-multiple-free=true`, which matches how CI runs. In 
that mode, `testCallCleanerOnceOnConcurrentFree` prints `IllegalStateException` 
stack traces from the losing `free()` call, even though the test itself still 
passes.

Proposed fix (test-only):
Wrap the two `segment.free()` calls in try/catch, swallow the expected 
`IllegalStateException`, and record any other unexpected `Throwable`. The 
existing assertion that the cleaner runs exactly once remains unchanged. This 
removes the noisy stack traces without changing production code.

If there are no objections, please assign this ticket to me and I’ll open a PR.


> Prevent Memory Leak by Closing MemoryExecutionGraphInfoStore on MiniCluster 
> Shutdown
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-35737
>                 URL: https://issues.apache.org/jira/browse/FLINK-35737
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Task
>    Affects Versions: 2.1.0
>            Reporter: Feng Jiajie
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 2.2.0
>
>
> MemoryExecutionGraphInfoStore registers a ShutdownHook upon construction and 
> deregisters it within its close() method.
> {code:java}
> public MemoryExecutionGraphInfoStore(...) {
>     ...
>     this.shutdownHook = ShutdownHookUtil.addShutdownHook(this, 
> getClass().getSimpleName(), LOG);
> }
> @Override
> public void close() throws IOException {
>     ...
>     // Remove shutdown hook to prevent resource leaks
>     ShutdownHookUtil.removeShutdownHook(shutdownHook, 
> getClass().getSimpleName(), LOG);
> }{code}
> Currently, MiniCluster instantiates a MemoryExecutionGraphInfoStore object 
> but doesn't retain a reference to it, nor does it call close() during its own 
> shutdown process.
> {code:java}
>         final DispatcherResourceManagerComponent 
> dispatcherResourceManagerComponent =
>                 dispatcherResourceManagerComponentFactory.create(
>                         ...
>                         new MemoryExecutionGraphInfoStore(),  // -> new
>                         ...); {code}
> This behavior leads to an accumulation of ShutdownHooks when running multiple 
> Flink jobs within the same local JVM. These accumulating hooks, along with 
> their associated references, contribute to a memory leak.
> This patch addresses the issue by ensuring that 
> MemoryExecutionGraphInfoStore's close() method is invoked during MiniCluster 
> shutdown.
> https://github.com/apache/flink/pull/25009



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to