I made a small change to the FLIP to add one additional configuration, "table.optimizer.multi-join.max-tables". This is only an additional functionality to allow users to pick the granularity of how many inputs they want their multi-way join operator to have, making it possible to break one multi-way join operator into multiple multi-way join operators. This is useful for specific cases where one single operator with a lot of inputs might become stressed (e.g. there's one single key-value data skew or in other extreme cases).
By the way, voting is open. It'd be great to have your +1 over there as well: https://lists.apache.org/thread/3z70lcpotdx6785dpsrvmncwyfb5mnnw Thanks everyone, Gustavo Am Fr., 9. Mai 2025 um 12:42 Uhr schrieb Gustavo de Morais < gustavopg...@gmail.com>: > Sure, that sounds good. I'll make sure we create a ticket for batch. > > Am Fr., 9. Mai 2025 um 12:39 Uhr schrieb Arvid Heise <ar...@apache.org>: > >> It makes sense to start with streaming and have the option apply only for >> streaming. We should also create a ticket for the batch implementation in >> the epic for the FLIP even if you don't plan to work on that, Gustavo. >> >> However, we should at least output a warning that this option is not used >> in batch until it is. WDYT? >> >> Best, >> >> Arvid >> >> On Fri, May 9, 2025 at 12:33 PM Gustavo de Morais <gustavopg...@gmail.com >> > >> wrote: >> >> > Hi Ron, >> > >> > Happy to have your support for the FLIP. Yes, the new option will be >> only >> > effective for streaming. Batch will continue to work as it currently >> does. >> > In general, a batch implementation could use completely different and >> more >> > efficient algorithms for its use case to perform a multi join, like the >> > leapfrog triejoin. >> > >> > Best, >> > Gustavo >> > >> > Am Do., 8. Mai 2025 um 05:07 Uhr schrieb Ron Liu <ron9....@gmail.com>: >> > >> > > Hi, Gustavo >> > > >> > > Sorry for the late participation in the FLIP discussion, this is a >> great >> > > feature to solve the headache of the stream join, Big +1. >> > > >> > > Regarding the new config option `table.optimizer.multi-join.enabled`, >> I >> > > have a question: is this option only effective in streaming mode, >> what is >> > > its behavior in batch mode? >> > > >> > > Best, >> > > Ron >> > > >> > > Gustavo de Morais <gustavopg...@gmail.com> 于2025年5月8日周四 00:03写道: >> > > >> > > > Hey Ferenc, that's a great callout. I'll make sure we add some >> > > > documentation regarding general advice on when to use multi-way >> joins >> > > (pros >> > > > and cons). >> > > > >> > > > Am Di., 6. Mai 2025 um 17:23 Uhr schrieb Ferenc Csaky >> > > > <ferenc.cs...@pm.me.invalid>: >> > > > >> > > > > Hi, >> > > > > >> > > > > I think the FLIP is in a fairly good state, +1 for the idea and >> the >> > > given >> > > > > design. This may be considered already, but IMO we should also add >> > some >> > > > > high-level details, pros, and cons of enabling this feature to the >> > > > website >> > > > > other than the config option description. >> > > > > >> > > > > Best, >> > > > > Ferenc >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > On Friday, May 2nd, 2025 at 14:47, Gustavo de Morais < >> > > > > gustavopg...@gmail.com> wrote: >> > > > > >> > > > > > >> > > > > > >> > > > > > Hey everyone, >> > > > > > >> > > > > > I'd be great to start voting next week. Let me know if there are >> > > > further >> > > > > > questions or feedback. >> > > > > > >> > > > > > Thanks, >> > > > > > Gustavo >> > > > > > >> > > > > > Am Mi., 30. Apr. 2025 um 15:07 Uhr schrieb Gustavo de Morais < >> > > > > > gustavopg...@gmail.com>: >> > > > > > >> > > > > > > Hey Arvid and David, thanks for the feedback! >> > > > > > > >> > > > > > > The limitations are in the flip, I just had pasted a wrong >> link >> > and >> > > > > fixed >> > > > > > > it. Let me know if there are other incorrect links. >> > > > > > > >> > > > > > > Yes, the thought of using statistics has potential. I've also >> > spent >> > > > > some >> > > > > > > on that. The precise statistics required here would however be >> > the >> > > > > amount >> > > > > > > of intermediate state/matches for each level and this is an >> > > > > information we >> > > > > > > only have at runtime/inside the operator. For that, we could >> look >> > > > into >> > > > > an >> > > > > > > adaptive multi-way join in a next interaction and the user >> could >> > > > > determine >> > > > > > > a max amount of state he's willing to store. This has >> potential >> > but >> > > > > would >> > > > > > > be a topic for a next FLIP, I added some information on that >> > under >> > > > the >> > > > > > > rejected alternatives. >> > > > > > > >> > > > > > > Kind regards, >> > > > > > > Gustavo >> > > > > > > >> > > > > > > Am Mo., 28. Apr. 2025 um 14:18 Uhr schrieb David Radley < >> > > > > > > david_rad...@uk.ibm.com>: >> > > > > > > >> > > > > > > > Hi Gustavo,This sounds like a great idea. >> > > > > > > > I notice the link limitations< >> > > > > > > > >> > > > > >> > > > >> > > >> > >> https://confluentinc.atlassian.net/wiki/spaces/FLINK/pages/4342875697/FLIP-516+Multi-Way+Join+Operator#Limitations >> > > > > > >> > > > > > > > in the Flip points outside of the document to something I do >> > not >> > > > have >> > > > > > > > access to. Please could you include the limitations in the >> flip >> > > > > itself. >> > > > > > > > >> > > > > > > > You mention re ordered binary joins might be less efficient >> by >> > > > > turning >> > > > > > > > them into a multi join. I wonder what the pros and cons >> are. I >> > > > > wonder can >> > > > > > > > we use statistics to decide whether we should do a multi way >> > > join? >> > > > > In this >> > > > > > > > case we could have an enum configuration something like: >> > > > > > > > table.optimizer.join= binary-join, multi-join, auto. >> > > > > > > > >> > > > > > > > Kind regards, David. >> > > > > > > > >> > > > > > > > From: Arvid Heise ar...@apache.org >> > > > > > > > Date: Monday, 28 April 2025 at 12:47 >> > > > > > > > To: dev@flink.apache.org dev@flink.apache.org >> > > > > > > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-516: Multi-Way Join >> > > Operator >> > > > > > > > Hi Gustavo, >> > > > > > > > >> > > > > > > > the idea and approach LGTM. +1 to proceed. >> > > > > > > > >> > > > > > > > Best, >> > > > > > > > >> > > > > > > > Arvid >> > > > > > > > >> > > > > > > > On Thu, Apr 24, 2025 at 4:58 PM Gustavo de Morais < >> > > > > gustavopg...@gmail.com >> > > > > > > > >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Hi everyone, >> > > > > > > > > >> > > > > > > > > I'd like to propose FLIP-516: Multi-Way Join Operator [1] >> for >> > > > > > > > > discussion. >> > > > > > > > > >> > > > > > > > > Chained non-temporal joins in Flink SQL often cause a "big >> > > state >> > > > > issue" >> > > > > > > > > due >> > > > > > > > > to large intermediate results, impacting performance and >> > > > > stability. This >> > > > > > > > > FLIP introduces a StreamingMultiJoinOperator to tackle >> this >> > by >> > > > > joining >> > > > > > > > > multiple inputs (that need to share a common key) >> > > simultaneously >> > > > > within >> > > > > > > > > one >> > > > > > > > > operator. >> > > > > > > > > >> > > > > > > > > The main goal is achieving zero intermediate state for >> these >> > > > > common join >> > > > > > > > > patterns, significantly reducing state size. This initial >> > > version >> > > > > > > > > requires >> > > > > > > > > a common partitioning key and focuses on INNER/LEFT joins, >> > with >> > > > > plans >> > > > > > > > > for >> > > > > > > > > future expansion. The operator is opt-in via >> > > > > > > > > table.optimizer.multi-join.enabled (default false). PR >> with >> > the >> > > > > initial >> > > > > > > > > version of the operator is available [2]. >> > > > > > > > > >> > > > > > > > > Happy to be contributing to this community, and looking >> > forward >> > > > to >> > > > > your >> > > > > > > > > feedback and thoughts. >> > > > > > > > > >> > > > > > > > > Kind regards, >> > > > > > > > > Gustavo de Morais >> > > > > > > > > >> > > > > > > > > [1] >> > > > > > > > >> > > > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator >> > > > > > > > >> > > > > > > > > [2] https://github.com/apache/flink/pull/26313 >> > > > > > > > >> > > > > > > > Unless otherwise stated above: >> > > > > > > > >> > > > > > > > IBM United Kingdom Limited >> > > > > > > > Registered in England and Wales with number 741598 >> > > > > > > > Registered office: Building C, IBM Hursley Office, Hursley >> Park >> > > > Road, >> > > > > > > > Winchester, Hampshire SO21 2JN >> > > > > >> > > > >> > > >> > >> >