Hi, I spent some time on this issue and gathered my findings here [1].
ConradJam, if you have some time, can you mayve try to run your job with the following dynamic params added to `env.java.opts.taskmanager` in `config.yaml`: -Dorg.apache.flink.shaded.netty4.io.netty.tryReflectionSetAccessible=true -Dorg.apache.flink.shaded.netty4.io.netty.leakDetection.level=PARANOID And see if it makes things better and/or gives some info about the leak. Thanks, Ferenc [1] https://issues.apache.org/jira/browse/FLINK-36510?focusedCommentId=17911219&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17911219 On Tuesday, January 7th, 2025 at 03:44, kerr <hepin1...@gmail.com> wrote: > > > I will try to do a ping-pong test this weekend to find out if there are > some leaks on pekko side, > > On change different from the old transport is , the new one is using > pooledByteBufAllocator by default. > > 何品 > > > PJ Fanning fannin...@gmail.com 于2025年1月7日周二 00:42写道: > > > We don't have any changes in 1.1.3 that might help Flink. > > > > We haven't yet tracked down a reproducible case that doesn't involve > > running the Flink tests. See > > https://github.com/apache/pekko/issues/1634 > > > > On Mon, 6 Jan 2025 at 16:17, Alexander Fedulov > > alexander.fedu...@gmail.com wrote: > > > > > @PJ Fanning, > > > Do you have a rough estimate when Pekko 1.1.3 might get released? > > > > > > Best, > > > Alex > > > > > > On Mon, 30 Dec 2024 at 04:12, ConradJam czy...@apache.org wrote: > > > > > > > By the way, I tried using Flink1.20 to consume Kafka and write Iceberg > > > > data, but after using default parameters, it quickly prompted OOM. I > > > > need > > > > to increase the taskmanager off heap memory. After improving, I ran it > > > > for > > > > 3 days and observed that the Outside JVM Memory of Taskmanager > > > > continued to > > > > slowly increase until OOM, which may also be affected by this issue. > > > > > > > > Matthias Pohl map...@apache.org 于2024年12月29日周日 21:22写道: > > > > > > > > > fyi: The following Flink Jira issues are related to this comment: > > > > > - FLINK-36290 [1] OOM in CI > > > > > - FLINK-36510 [2]: netty version bump which was backported to 1.20 > > > > > and 1.19 > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-36290 > > > > > [2] https://issues.apache.org/jira/browse/FLINK-36510 > > > > > > > > > > On Sat, Dec 28, 2024 at 4:02 PM PJ Fanning fannin...@apache.org > > > > > wrote: > > > > > > > > > > > It is recommended that you revert back to Pekko 1.0 which uses > > > > > > Netty 3. > > > > > > > > > > > > We will notify you when Pekko 1.1.3 is released. > > > > > > > > > > > > On 2024/12/28 14:54:02 kerr wrote: > > > > > > > > > > > > > First, super sorry about this. > > > > > > > https://github.com/apache/pekko/pull/1635 > > > > > > > > > > > > > > I just noticed this issue and did a quick check about it. > > > > > > > The memory leak is because I forgot the `Bytebuf#release` call. > > > > > > > > > > > > > > We will try to work out a release soon on 1.1.3 > > > > > > > > > > > > > > 何品 > > > > > > > > -- > > > > Best > > > > > > > > ConradJam