I made a small change to the FLIP to add one additional configuration,
"table.optimizer.multi-join.max-tables". This is only an
additional functionality to allow users to pick the granularity of how many
inputs they want their multi-way join operator to have, making it possible
to break one multi-way join operator into multiple multi-way join
operators. This is useful for specific cases where one single operator with
a lot of inputs might become stressed (e.g. there's one single key-value
data skew or in other extreme cases).

By the way, voting is open. It'd be great to have your +1 over there as
well: https://lists.apache.org/thread/3z70lcpotdx6785dpsrvmncwyfb5mnnw

Thanks everyone,
Gustavo

Am Fr., 9. Mai 2025 um 12:42 Uhr schrieb Gustavo de Morais <
gustavopg...@gmail.com>:

> Sure, that sounds good. I'll make sure we create a ticket for batch.
>
> Am Fr., 9. Mai 2025 um 12:39 Uhr schrieb Arvid Heise <ar...@apache.org>:
>
>> It makes sense to start with streaming and have the option apply only for
>> streaming. We should also create a ticket for the batch implementation in
>> the epic for the FLIP even if you don't plan to work on that, Gustavo.
>>
>> However, we should at least output a warning that this option is not used
>> in batch until it is. WDYT?
>>
>> Best,
>>
>> Arvid
>>
>> On Fri, May 9, 2025 at 12:33 PM Gustavo de Morais <gustavopg...@gmail.com
>> >
>> wrote:
>>
>> > Hi Ron,
>> >
>> > Happy to have your support for the FLIP. Yes, the new option will be
>> only
>> > effective for streaming. Batch will continue to work as it currently
>> does.
>> > In general, a batch implementation could use completely different and
>> more
>> > efficient algorithms for its use case to perform a multi join, like the
>> > leapfrog triejoin.
>> >
>> > Best,
>> > Gustavo
>> >
>> > Am Do., 8. Mai 2025 um 05:07 Uhr schrieb Ron Liu <ron9....@gmail.com>:
>> >
>> > > Hi, Gustavo
>> > >
>> > > Sorry for the late participation in the FLIP discussion, this is a
>> great
>> > > feature to solve the headache of the stream join, Big +1.
>> > >
>> > > Regarding the new config option `table.optimizer.multi-join.enabled`,
>> I
>> > > have a question: is this option only effective in streaming mode,
>> what is
>> > > its behavior in batch mode?
>> > >
>> > > Best,
>> > > Ron
>> > >
>> > > Gustavo de Morais <gustavopg...@gmail.com> 于2025年5月8日周四 00:03写道:
>> > >
>> > > > Hey Ferenc, that's a great callout. I'll make sure we add some
>> > > > documentation regarding general advice on when to use multi-way
>> joins
>> > > (pros
>> > > > and cons).
>> > > >
>> > > > Am Di., 6. Mai 2025 um 17:23 Uhr schrieb Ferenc Csaky
>> > > > <ferenc.cs...@pm.me.invalid>:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > I think the FLIP is in a fairly good state, +1 for the idea and
>> the
>> > > given
>> > > > > design. This may be considered already, but IMO we should also add
>> > some
>> > > > > high-level details, pros, and cons of enabling this feature to the
>> > > > website
>> > > > > other than the config option description.
>> > > > >
>> > > > > Best,
>> > > > > Ferenc
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Friday, May 2nd, 2025 at 14:47, Gustavo de Morais <
>> > > > > gustavopg...@gmail.com> wrote:
>> > > > >
>> > > > > >
>> > > > > >
>> > > > > > Hey everyone,
>> > > > > >
>> > > > > > I'd be great to start voting next week. Let me know if there are
>> > > > further
>> > > > > > questions or feedback.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Gustavo
>> > > > > >
>> > > > > > Am Mi., 30. Apr. 2025 um 15:07 Uhr schrieb Gustavo de Morais <
>> > > > > > gustavopg...@gmail.com>:
>> > > > > >
>> > > > > > > Hey Arvid and David, thanks for the feedback!
>> > > > > > >
>> > > > > > > The limitations are in the flip, I just had pasted a wrong
>> link
>> > and
>> > > > > fixed
>> > > > > > > it. Let me know if there are other incorrect links.
>> > > > > > >
>> > > > > > > Yes, the thought of using statistics has potential. I've also
>> > spent
>> > > > > some
>> > > > > > > on that. The precise statistics required here would however be
>> > the
>> > > > > amount
>> > > > > > > of intermediate state/matches for each level and this is an
>> > > > > information we
>> > > > > > > only have at runtime/inside the operator. For that, we could
>> look
>> > > > into
>> > > > > an
>> > > > > > > adaptive multi-way join in a next interaction and the user
>> could
>> > > > > determine
>> > > > > > > a max amount of state he's willing to store. This has
>> potential
>> > but
>> > > > > would
>> > > > > > > be a topic for a next FLIP, I added some information on that
>> > under
>> > > > the
>> > > > > > > rejected alternatives.
>> > > > > > >
>> > > > > > > Kind regards,
>> > > > > > > Gustavo
>> > > > > > >
>> > > > > > > Am Mo., 28. Apr. 2025 um 14:18 Uhr schrieb David Radley <
>> > > > > > > david_rad...@uk.ibm.com>:
>> > > > > > >
>> > > > > > > > Hi Gustavo,This sounds like a great idea.
>> > > > > > > > I notice the link limitations<
>> > > > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://confluentinc.atlassian.net/wiki/spaces/FLINK/pages/4342875697/FLIP-516+Multi-Way+Join+Operator#Limitations
>> > > > > >
>> > > > > > > > in the Flip points outside of the document to something I do
>> > not
>> > > > have
>> > > > > > > > access to. Please could you include the limitations in the
>> flip
>> > > > > itself.
>> > > > > > > >
>> > > > > > > > You mention re ordered binary joins might be less efficient
>> by
>> > > > > turning
>> > > > > > > > them into a multi join. I wonder what the pros and cons
>> are. I
>> > > > > wonder can
>> > > > > > > > we use statistics to decide whether we should do a multi way
>> > > join?
>> > > > > In this
>> > > > > > > > case we could have an enum configuration something like:
>> > > > > > > > table.optimizer.join= binary-join, multi-join, auto.
>> > > > > > > >
>> > > > > > > > Kind regards, David.
>> > > > > > > >
>> > > > > > > > From: Arvid Heise ar...@apache.org
>> > > > > > > > Date: Monday, 28 April 2025 at 12:47
>> > > > > > > > To: dev@flink.apache.org dev@flink.apache.org
>> > > > > > > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-516: Multi-Way Join
>> > > Operator
>> > > > > > > > Hi Gustavo,
>> > > > > > > >
>> > > > > > > > the idea and approach LGTM. +1 to proceed.
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > >
>> > > > > > > > Arvid
>> > > > > > > >
>> > > > > > > > On Thu, Apr 24, 2025 at 4:58 PM Gustavo de Morais <
>> > > > > gustavopg...@gmail.com
>> > > > > > > >
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi everyone,
>> > > > > > > > >
>> > > > > > > > > I'd like to propose FLIP-516: Multi-Way Join Operator [1]
>> for
>> > > > > > > > > discussion.
>> > > > > > > > >
>> > > > > > > > > Chained non-temporal joins in Flink SQL often cause a "big
>> > > state
>> > > > > issue"
>> > > > > > > > > due
>> > > > > > > > > to large intermediate results, impacting performance and
>> > > > > stability. This
>> > > > > > > > > FLIP introduces a StreamingMultiJoinOperator to tackle
>> this
>> > by
>> > > > > joining
>> > > > > > > > > multiple inputs (that need to share a common key)
>> > > simultaneously
>> > > > > within
>> > > > > > > > > one
>> > > > > > > > > operator.
>> > > > > > > > >
>> > > > > > > > > The main goal is achieving zero intermediate state for
>> these
>> > > > > common join
>> > > > > > > > > patterns, significantly reducing state size. This initial
>> > > version
>> > > > > > > > > requires
>> > > > > > > > > a common partitioning key and focuses on INNER/LEFT joins,
>> > with
>> > > > > plans
>> > > > > > > > > for
>> > > > > > > > > future expansion. The operator is opt-in via
>> > > > > > > > > table.optimizer.multi-join.enabled (default false). PR
>> with
>> > the
>> > > > > initial
>> > > > > > > > > version of the operator is available [2].
>> > > > > > > > >
>> > > > > > > > > Happy to be contributing to this community, and looking
>> > forward
>> > > > to
>> > > > > your
>> > > > > > > > > feedback and thoughts.
>> > > > > > > > >
>> > > > > > > > > Kind regards,
>> > > > > > > > > Gustavo de Morais
>> > > > > > > > >
>> > > > > > > > > [1]
>> > > > > > > >
>> > > > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator
>> > > > > > > >
>> > > > > > > > > [2] https://github.com/apache/flink/pull/26313
>> > > > > > > >
>> > > > > > > > Unless otherwise stated above:
>> > > > > > > >
>> > > > > > > > IBM United Kingdom Limited
>> > > > > > > > Registered in England and Wales with number 741598
>> > > > > > > > Registered office: Building C, IBM Hursley Office, Hursley
>> Park
>> > > > Road,
>> > > > > > > > Winchester, Hampshire SO21 2JN
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to