Hi,

I spent some time on this issue and gathered my findings here [1].

ConradJam, if you have some time, can you mayve try to run your job
with the following dynamic params added to
`env.java.opts.taskmanager` in `config.yaml`:

  -Dorg.apache.flink.shaded.netty4.io.netty.tryReflectionSetAccessible=true
  -Dorg.apache.flink.shaded.netty4.io.netty.leakDetection.level=PARANOID

And see if it makes things better and/or gives some info about the
leak.

Thanks,
Ferenc

[1] 
https://issues.apache.org/jira/browse/FLINK-36510?focusedCommentId=17911219&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17911219



On Tuesday, January 7th, 2025 at 03:44, kerr <hepin1...@gmail.com> wrote:

> 
> 
> I will try to do a ping-pong test this weekend to find out if there are
> some leaks on pekko side,
> 
> On change different from the old transport is , the new one is using
> pooledByteBufAllocator by default.
> 
> 何品
> 
> 
> PJ Fanning fannin...@gmail.com 于2025年1月7日周二 00:42写道:
> 
> > We don't have any changes in 1.1.3 that might help Flink.
> > 
> > We haven't yet tracked down a reproducible case that doesn't involve
> > running the Flink tests. See
> > https://github.com/apache/pekko/issues/1634
> > 
> > On Mon, 6 Jan 2025 at 16:17, Alexander Fedulov
> > alexander.fedu...@gmail.com wrote:
> > 
> > > @PJ Fanning,
> > > Do you have a rough estimate when Pekko 1.1.3 might get released?
> > > 
> > > Best,
> > > Alex
> > > 
> > > On Mon, 30 Dec 2024 at 04:12, ConradJam czy...@apache.org wrote:
> > > 
> > > > By the way, I tried using Flink1.20 to consume Kafka and write Iceberg
> > > > data, but after using default parameters, it quickly prompted OOM. I
> > > > need
> > > > to increase the taskmanager off heap memory. After improving, I ran it
> > > > for
> > > > 3 days and observed that the Outside JVM Memory of Taskmanager
> > > > continued to
> > > > slowly increase until OOM, which may also be affected by this issue.
> > > > 
> > > > Matthias Pohl map...@apache.org 于2024年12月29日周日 21:22写道:
> > > > 
> > > > > fyi: The following Flink Jira issues are related to this comment:
> > > > > - FLINK-36290 [1] OOM in CI
> > > > > - FLINK-36510 [2]: netty version bump which was backported to 1.20
> > > > > and 1.19
> > > > > 
> > > > > [1] https://issues.apache.org/jira/browse/FLINK-36290
> > > > > [2] https://issues.apache.org/jira/browse/FLINK-36510
> > > > > 
> > > > > On Sat, Dec 28, 2024 at 4:02 PM PJ Fanning fannin...@apache.org
> > > > > wrote:
> > > > > 
> > > > > > It is recommended that you revert back to Pekko 1.0 which uses
> > > > > > Netty 3.
> > > > > > 
> > > > > > We will notify you when Pekko 1.1.3 is released.
> > > > > > 
> > > > > > On 2024/12/28 14:54:02 kerr wrote:
> > > > > > 
> > > > > > > First, super sorry about this.
> > > > > > > https://github.com/apache/pekko/pull/1635
> > > > > > > 
> > > > > > > I just noticed this issue and did a quick check about it.
> > > > > > > The memory leak is because I forgot the `Bytebuf#release` call.
> > > > > > > 
> > > > > > > We will try to work out a release soon on 1.1.3
> > > > > > > 
> > > > > > > 何品
> > > > 
> > > > --
> > > > Best
> > > > 
> > > > ConradJam

Reply via email to