Hey Arvid and David, thanks for the feedback! The limitations are in the flip, I just had pasted a wrong link and fixed it. Let me know if there are other incorrect links.
Yes, the thought of using statistics has potential. I've also spent some on that. The precise statistics required here would however be the amount of intermediate state/matches for each level and this is an information we only have at runtime/inside the operator. For that, we could look into an adaptive multi-way join in a next interaction and the user could determine a max amount of state he's willing to store. This has potential but would be a topic for a next FLIP, I added some information on that under the rejected alternatives. Kind regards, Gustavo Am Mo., 28. Apr. 2025 um 14:18 Uhr schrieb David Radley < david_rad...@uk.ibm.com>: > Hi Gustavo,This sounds like a great idea. > I notice the link limitations< > https://confluentinc.atlassian.net/wiki/spaces/FLINK/pages/4342875697/FLIP-516+Multi-Way+Join+Operator#Limitations> > in the Flip points outside of the document to something I do not have > access to. Please could you include the limitations in the flip itself. > > You mention re ordered binary joins might be less efficient by turning > them into a multi join. I wonder what the pros and cons are. I wonder can > we use statistics to decide whether we should do a multi way join? In this > case we could have an enum configuration something like: > table.optimizer.join= binary-join, multi-join, auto. > > Kind regards, David. > > > From: Arvid Heise <ar...@apache.org> > Date: Monday, 28 April 2025 at 12:47 > To: dev@flink.apache.org <dev@flink.apache.org> > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-516: Multi-Way Join Operator > Hi Gustavo, > > the idea and approach LGTM. +1 to proceed. > > Best, > > Arvid > > On Thu, Apr 24, 2025 at 4:58 PM Gustavo de Morais <gustavopg...@gmail.com> > wrote: > > > Hi everyone, > > > > I'd like to propose FLIP-516: Multi-Way Join Operator [1] for discussion. > > > > Chained non-temporal joins in Flink SQL often cause a "big state issue" > due > > to large intermediate results, impacting performance and stability. This > > FLIP introduces a StreamingMultiJoinOperator to tackle this by joining > > multiple inputs (that need to share a common key) simultaneously within > one > > operator. > > > > The main goal is achieving zero intermediate state for these common join > > patterns, significantly reducing state size. This initial version > requires > > a common partitioning key and focuses on INNER/LEFT joins, with plans for > > future expansion. The operator is opt-in via > > table.optimizer.multi-join.enabled (default false). PR with the initial > > version of the operator is available [2]. > > > > Happy to be contributing to this community, and looking forward to your > > feedback and thoughts. > > > > Kind regards, > > Gustavo de Morais > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator > > > > [2] https://github.com/apache/flink/pull/26313 > > > > Unless otherwise stated above: > > IBM United Kingdom Limited > Registered in England and Wales with number 741598 > Registered office: Building C, IBM Hursley Office, Hursley Park Road, > Winchester, Hampshire SO21 2JN >