Hey everyone,

I'd be great to start voting next week. Let me know if there are further
questions or feedback.

Thanks,
Gustavo

Am Mi., 30. Apr. 2025 um 15:07 Uhr schrieb Gustavo de Morais <
gustavopg...@gmail.com>:

> Hey Arvid and David, thanks for the feedback!
>
> The limitations are in the flip, I just had pasted a wrong link and fixed
> it. Let me know if there are other incorrect links.
>
> Yes, the thought of using statistics has potential. I've also spent some
> on that. The precise statistics required here would however be the amount
> of intermediate state/matches for each level and this is an information we
> only have at runtime/inside the operator. For that, we could look into an
> adaptive multi-way join in a next interaction and the user could determine
> a max amount of state he's willing to store. This has potential but would
> be a topic for a next FLIP, I added some information on that under the
> rejected alternatives.
>
> Kind regards,
> Gustavo
>
> Am Mo., 28. Apr. 2025 um 14:18 Uhr schrieb David Radley <
> david_rad...@uk.ibm.com>:
>
>> Hi Gustavo,This sounds like a great idea.
>> I notice the link  limitations<
>> https://confluentinc.atlassian.net/wiki/spaces/FLINK/pages/4342875697/FLIP-516+Multi-Way+Join+Operator#Limitations>
>> in the Flip points outside of the document to something I do not have
>> access to. Please could you include the limitations in the flip itself.
>>
>> You mention re ordered binary joins might be less efficient by turning
>> them into a multi join. I wonder what the pros and cons are. I wonder can
>> we use statistics to decide whether we should do a multi way join? In this
>> case we could have an enum configuration something like:
>> table.optimizer.join= binary-join, multi-join, auto.
>>
>> Kind regards, David.
>>
>>
>> From: Arvid Heise <ar...@apache.org>
>> Date: Monday, 28 April 2025 at 12:47
>> To: dev@flink.apache.org <dev@flink.apache.org>
>> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-516: Multi-Way Join Operator
>> Hi Gustavo,
>>
>> the idea and approach LGTM. +1 to proceed.
>>
>> Best,
>>
>> Arvid
>>
>> On Thu, Apr 24, 2025 at 4:58 PM Gustavo de Morais <gustavopg...@gmail.com
>> >
>> wrote:
>>
>> > Hi everyone,
>> >
>> > I'd like to propose FLIP-516: Multi-Way Join Operator [1] for
>> discussion.
>> >
>> > Chained non-temporal joins in Flink SQL often cause a "big state issue"
>> due
>> > to large intermediate results, impacting performance and stability. This
>> > FLIP introduces a StreamingMultiJoinOperator to tackle this by joining
>> > multiple inputs (that need to share a common key) simultaneously within
>> one
>> > operator.
>> >
>> > The main goal is achieving zero intermediate state for these common join
>> > patterns, significantly reducing state size. This initial version
>> requires
>> > a common partitioning key and focuses on INNER/LEFT joins, with plans
>> for
>> > future expansion. The operator is opt-in via
>> > table.optimizer.multi-join.enabled (default false). PR with the initial
>> > version of the operator is available [2].
>> >
>> > Happy to be contributing to this community, and looking forward to your
>> > feedback and thoughts.
>> >
>> > Kind regards,
>> > Gustavo de Morais
>> >
>> > [1]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator
>> >
>> > [2] https://github.com/apache/flink/pull/26313
>> >
>>
>> Unless otherwise stated above:
>>
>> IBM United Kingdom Limited
>> Registered in England and Wales with number 741598
>> Registered office: Building C, IBM Hursley Office, Hursley Park Road,
>> Winchester, Hampshire SO21 2JN
>>
>

Reply via email to