+1 for option (2)

Marton

On Wed, Jan 22, 2025 at 2:10 PM Maximilian Michels <m...@apache.org> wrote:

> +1 for option (2), looks like the best compromise given the constraints.
>
> -Max
>
> On Tue, Jan 21, 2025 at 2:27 PM Ferenc Csaky <ferenc.cs...@pm.me.invalid>
> wrote:
> >
> > FYI, I opened a PR about documenting the fine-tuning options for
> > Netty4, feel free to check it out [1]. If content LG, I'll open
> > PRs against the prev release branches as well.
> >
> > Thanks,
> > Ferenc
> >
> > [1] https://github.com/apache/flink/pull/26043
> >
> >
> >
> >
> > On Monday, January 20th, 2025 at 19:43, gyula.f...@gmail.com <
> gyula.f...@gmail.com> wrote:
> >
> > >
> > >
> > > +1 for option 2
> > >
> > > I suggest we simply move forward with the release candidates
> > >
> > > Gyula
> > >
> > > Sent from my iPhone
> > >
> > > > On 20 Jan 2025, at 19:00, He Pin he...@apache.org wrote:
> > > >
> > > > I think @pjfanning means you configured the pekko version with the
> 1.2.0 snapshot
> > > >
> https://repository.apache.org/content/groups/snapshots/org/apache/pekko/pekko-actor_2.13/1.2.0-M0+55-a75bc7a7-SNAPSHOT/
> > > >
> > > > with `unpooled-heap` allocator type.
> > > >
> > > > and change the failing test with 7MB to see if it succeeds, then he
> is OK with releasing it in 1.1.4
> > > >
> > > > But I think option 2 will not cause many problems in real
> production, who will run a Flink with 7MB? Even my phone has 16GB RAM.
> > > >
> > > > > On 2025/01/20 15:06:09 Alexander Fedulov wrote:
> > > > > Hi Ferenc,
> > > > >
> > > > > Impacting data exchange performance is definitely not an option.
> I'm simply
> > > > > not entirely sure if Flink's network stack delegates buffer
> allocation to
> > > > > Netty or handles it "manually."
> > > > > That said, even if we could confirm that this part is not
> impacted, setting
> > > > > the global configuration is still not a good idea. We already have
> the HDFS
> > > > > filesystem dependency inside the framework that depends on Netty
> 4, plus
> > > > > potentially multiple externalized connectors.
> > > > >
> > > > > Let me summarize the options we have, knowing that the allocator
> control
> > > > > localized to Pekko is currently not available:
> > > > >
> > > > > 1) Revert the changes
> > > > > Pros:
> > > > > - Unblocks the release
> > > > > Cons:
> > > > > - Leaves us with 20 unaddressed critical CVEs
> > > > >
> > > > > 2) Upgrade to Netty 4 and let the RPC module run with Netty's
> default
> > > > > settings (also bump the limits for the memory-restricted test, as
> done on
> > > > > master):
> > > > > Pros:
> > > > > - Resolves CVEs
> > > > > - Unblocks the release
> > > > > Cons:
> > > > > - Higher memory usage with the risk of tipping containers already
> at
> > > > > their memory limit into OOM
> > > > > - Unclear if the fractional autoallocation of Flink memory needs to
> > > > > be adjusted according to the new defaults
> > > > >
> > > > > 3) Help the colleagues working on Pekko to confirm that the patch
> for
> > > > > settings fulfills our purposes [1] and wait for a new Pekko
> release that
> > > > > allows isolated Netty settings solely for flink-rpc:
> > > > > Pros:
> > > > > - Resolves the CVEs
> > > > > - Memory usage similar to Netty 3, reducing the risk of OOM
> surprises
> > > > > for users
> > > > > Cons:
> > > > > - Could take time for the new Pekko release
> > > > >
> > > > > [1]
> https://github.com/apache/pekko/pull/1709#issuecomment-2599698240
> > > > >
> > > > > Unless we can hope for a timely Pekko release, my vote would be to
> go ahead
> > > > > with option 2 and clearly document the potential need for
> increasing
> > > > > container memory limits for the sake of improved security.
> > > > >
> > > > > Best regards,
> > > > > Alex
> > > > >
> > > > > On Mon, 20 Jan 2025 at 14:11, Ferenc Csaky
> ferenc.cs...@pm.me.invalid
> > > > > wrote:
> > > > >
> > > > > > Since `flink-runtime` uses Netty4 for quite a while, I believe
> > > > > > enforcing UNPOOLED will affect shuffle performance. I did not
> > > > > > performed actual tests comparing Netty3 and Netty4 in this
> regard,
> > > > > > so I cannot back this with actual numbers, but I think losing
> > > > > > shuffle performance would affect more real-world use-cases and
> be a
> > > > > > bigger problem, than a bit more overall memory consumption for
> > > > > > RPC communication.
> > > > > >
> > > > > > To cover highly resource-limited use-cases where it might be
> useful
> > > > > > to spare some memory and performance is not critical, I would
> > > > > > suggest to document these options in the release notes and in the
> > > > > > product documentation as well. I already created a ticket for
> > > > > > that [1], so I plan to deliver it in the next couple days.
> > > > > >
> > > > > > WDYT?
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/FLINK-37099
> > > > > >
> > > > > > On Monday, January 20th, 2025 at 13:10, ConradJam
> jam.gz...@gmail.com
> > > > > > wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Alexis Sarda-Espinosa sarda.espin...@gmail.com 于2025年1月20日周一
> 18:38写道:
> > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > what about io.netty.maxDirectMemory [1]? Is it relevant? I
> haven't been
> > > > > > > > able to understand exactly how much that changes, but I find
> it odd
> > > > > > > > that,
> > > > > > > > for the default, <"practical max direct memory" would be 2 *
> max
> > > > > > > > memory as
> > > > > > > > defined by the JDK>.
> > > > > > > >
> > > > > > > > [1]
> > > > > >
> > > > > >
> https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L162
> > > > > >
> > > > > > > > Regards,
> > > > > > > > Alexis.
> > > > > > > >
> > > > > > > > Am Mo., 20. Jan. 2025 um 04:53 Uhr schrieb He Pin
> he...@apache.org:
> > > > > > > >
> > > > > > > > > I think so, not sure how Flink works, but if they share
> the same key
> > > > > > > > > and
> > > > > > > > > running in the same JVM process, which can be.
> > > > > > > > >
> > > > > > > > > On 2025/01/18 16:58:15 Alexander Fedulov wrote:
> > > > > > > > >
> > > > > > > > > > @He Pin,
> > > > > > > > > > Thanks for bringing this up.
> > > > > > > > > > So, if I understand correctly, the problem is that there
> is
> > > > > > > > > > currently
> > > > > > > > > > no
> > > > > > > > > > way to control the underlying allocator exclusively for
> Pekko.
> > > > > > > > > > Setting
> > > > > > > > > > `-Dio.netty.allocator.type=unpooled` would impact
> Netty's behavior
> > > > > > > > > > across
> > > > > > > > > > other parts of the framework.
> > > > > > > > > >
> > > > > > > > > > Does anyone know if this could potentially affect the
> data exchange
> > > > > > > > > > network
> > > > > > > > > > stack in `flink-runtime`, which is also based on Netty?
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Alex
> > > > > > > > > >
> > > > > > > > > > On Sat, 18 Jan 2025 at 04:10, He Pin he...@apache.org
> wrote:
> > > > > > > > > >
> > > > > > > > > > > > +1 for Netty4 with UNPOOLED memory allocator to not
> change the
> > > > > > > > > > > > default
> > > > > > > > > > > > memory footprint.
> > > > > > > > > > >
> > > > > > > > > > > That can only be done with another release, otherwise
> if will
> > > > > > > > > > > reduce
> > > > > > > > > > > the
> > > > > > > > > > > performance.
> > > > > > > > > > >
> > > > > > > > > > > see https://github.com/apache/pekko/pull/1709
> > > > > > > > > > >
> > > > > > > > > > > On 2025/01/17 17:05:06 Maximilian Michels wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1 for Netty4 with UNPOOLED memory allocator to not
> change the
> > > > > > > > > > > > default
> > > > > > > > > > > > memory footprint.
> > > > > > > > > > > >
> > > > > > > > > > > > -Max
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jan 17, 2025 at 1:15 PM Samrat Deb
> > > > > > > > > > > > decordea...@gmail.com
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > +1 to move to netty4.
> > > > > > > > > > > > >
> > > > > > > > > > > > > bests,
> > > > > > > > > > > > > Samrat
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, 17 Jan 2025 at 5:30 PM, Luke Chen
> show...@gmail.com
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the summary!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +1 to upgrade Pekko to have netty 4 in 1.19.2
> and 1.20.1
> > > > > > > > > > > > > > releases.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > > Luke
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Jan 17, 2025 at 7:50 PM He Pin
> he...@apache.org
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +1 to Netty 4
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On 2025/01/16 15:12:40 Alexander Fedulov wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We have one remaining blocker for the 1.19.2
> and 1.20.1
> > > > > > > > > > > > > > > > releases,
> > > > > > > > > > > > > > > > namely
> > > > > > > > > > > > > > > > the issue associated with ticket
> FLINK-36510: "Upgrade
> > > > > > > > > > > > > > > > Pekko
> > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > 1.0.1
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > 1.1.2" [1]. Here is the context:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - The flink-rpc module is currently based on
> Pekko
> > > > > > > > > > > > > > > > 1.0.1,
> > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > bundles
> > > > > > > > > > > > > > > > Netty version 3.10.6. Netty 3.10.6 is the
> last 3.x
> > > > > > > > > > > > > > > > release and
> > > > > > > > > > > > > > > > officially
> > > > > > > > > > > > > > > > reached EOL more than eight years ago. It
> contains at
> > > > > > > > > > > > > > > > least
> > > > > > > > > > > > > > > > 20 known
> > > > > > > > > > > > > > > > critical vulnerabilities [2].
> > > > > > > > > > > > > > > > - FLINK-36510 [1] upgrades flink-rpc to
> Pekko 1.1.2,
> > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > introduces
> > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > long-awaited migration to Netty 4.x.
> > > > > > > > > > > > > > > > - Memory allocation in Netty 4.x differs
> from Netty 3.x
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > has a
> > > > > > > > > > > > > > > > larger
> > > > > > > > > > > > > > > > memory footprint with default settings [3].
> > > > > > > > > > > > > > > > - Norman Mauerer, Netty's project lead,
> strongly
> > > > > > > > > > > > > > > > recommends
> > > > > > > > > > > > > > > > moving
> > > > > > > > > > > > > > > > away
> > > > > > > > > > > > > > > > from Netty 3 as soon as possible [4].
> > > > > > > > > > > > > > > > - According to Norman, setting
> > > > > > > > > > > > > > > > -Dio.netty.allocator.type=unpooled
> > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > approximate Netty 3's memory behavior at the
> expense of
> > > > > > > > > > > > > > > > performance
> > > > > > > > > > > > > > > > improvements that Netty 4 would otherwise
> provide. That
> > > > > > > > > > > > > > > > said,
> > > > > > > > > > > > > > > > Netty
> > > > > > > > > > > > > > > > 4
> > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > -Dio.netty.allocator.type=unpooled is not
> expected to
> > > > > > > > > > > > > > > > perform
> > > > > > > > > > > > > > > > worse
> > > > > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > Netty 3.
> > > > > > > > > > > > > > > > - Although this change might seem too
> substantial for a
> > > > > > > > > > > > > > > > patch
> > > > > > > > > > > > > > > > release, I
> > > > > > > > > > > > > > > > propose proceeding with it due to the
> accumulated risks
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > staying
> > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > Netty
> > > > > > > > > > > > > > > > 3.10.6. This will need to be addressed in a
> 1.20 as a
> > > > > > > > > > > > > > > > patch
> > > > > > > > > > > > > > > > release
> > > > > > > > > > > > > > > > anyway,
> > > > > > > > > > > > > > > > given that 1.20 is designated as LTS, and we
> can expect
> > > > > > > > > > > > > > > > Netty
> > > > > > > > > > > > > > > > 3 to
> > > > > > > > > > > > > > > > accrue
> > > > > > > > > > > > > > > > even more CVEs over time.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Here you can find more details of the ongoing
> > > > > > > > > > > > > > > > discussion
> > > > > > > > > > > > > > > > [5].
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Looking forward to hearing the community's
> thoughts on
> > > > > > > > > > > > > > > > whether we
> > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > proceed with the proposed changes.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > [1]
> https://issues.apache.org/jira/browse/FLINK-36510
> > > > > > > > > > > > > > > > [2]
> > > > > >
> > > > > > https://mvnrepository.com/artifact/io.netty/netty/3.10.6.Final
> > > > > >
> > > > > > > > > > > > > > > > [3]
> > > > > >
> > > > > >
> https://issues.apache.org/jira/browse/FLINK-36510?focusedCommentId=17911219&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17911219
> > > > > >
> > > > > > > > > > > > > > > > [4]
> > > > > >
> > > > > >
> https://github.com/apache/flink/pull/25866#issuecomment-2595168560
> > > > > >
> > > > > > > > > > > > > > > > [5]
> https://github.com/apache/flink/pull/25866
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Alex
>

Reply via email to