Re: Re: Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-14 Thread weijie guo
Hi all, Thanks for all the feedback and suggestions so far. If there is no further comment, we will open the voting thread next monday. Best regards, Weijie weijie guo 于2024年6月14日周五 15:49写道: > Thanks Lincoln for the quick response. > > > Since we've decided to extend a new hint option 'shuf

Re: Re: Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-14 Thread weijie guo
Thanks Lincoln for the quick response. > Since we've decided to extend a new hint option 'shuffle' to the current `LOOKUP` join hint, do we support hash shuffle as well?(It seems like it shouldn't require a lot of extra work, right?) This will deliver a complete new feature to users, also because

Re: Re: Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-13 Thread Lincoln Lee
Thanks Weijie & Wencong for your update including the conclusions of the offline discussion. There's one thing need to be confirmed in the FLIP: > The hint only provides a suggestion to the optimizer, it is not an enforcer. As a result, If the target dim table not implements SupportsLookupCustomSh

Re:Re: Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-12 Thread Wencong Liu
Hi Jingsong, Some of the points you mentioned are currently clarified in the updated FLIP. Please check it out. 1. Enabling custom data distribution can be done through the LOOKUP SQL Hint. There are detailed examples provided in the FLIP. 2. We will add the isDeterministic method to the `In

Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-11 Thread Zhanghao Chen
Thanks for the clarification, that makes sense. +1 for the proposal. Best, Zhanghao Chen From: weijie guo Sent: Wednesday, June 12, 2024 14:20 To: dev@flink.apache.org Subject: Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup

Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-11 Thread weijie guo
ioning strategy will outperform partitioning by key? Since > you've mentioned Paimon in doc, maybe an example on Paimon. > > Best, > Zhanghao Chen > > From: weijie guo > Sent: Friday, June 7, 2024 9:59 > To: dev > Subject: [DISCUSS] F

Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-11 Thread Zhanghao Chen
7, 2024 9:59 To: dev Subject: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join Hi devs, I'd like to start a discussion about FLIP-462[1]: Support Custom Data Distribution for Input Stream of Lookup Join. Lookup Join is an important feature in Flink,

Re: Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-11 Thread Jingsong Li
Hi all, +1 to this FLIP, very thanks all for your proposal. isDeterministic looks good to me too. We can consider stating the following points: 1. How to enable custom data distribution? Is it a dynamic hint? Can you provide an SQL example. 2. What impact will it have when the mainstream is ch

Re:Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-11 Thread Wencong Liu
Hi Lincoln, Thanks for your reply. Weijie and I discussed these two issues offline, and here are the results of our discussion: 1. When the user utilizes the hash lookup join hint introduced by FLIP-204[1], the `SupportsLookupCustomShuffle` interface should be ignored. This is because the hash l

Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-10 Thread Lincoln Lee
Hi Weijie, Thanks for your proposal, this will be a useful advanced optimization for connector developers! I have two questions: 1. FLIP-204[1] hash lookup join hint is mentioned in this FLIP, what's the apply ordering of the two feature? For example, a connector that implements the `SupportsLoo

Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-06 Thread Xintong Song
+1 for this proposal. This FLIP will make it possible for each lookup join parallel task to only access and cache a subset of the data. This will significantly improve the performance and reduce the overhead when using Paimon for the dimension table. And it's general enough to also be leveraged by

[DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-06 Thread weijie guo
Hi devs, I'd like to start a discussion about FLIP-462[1]: Support Custom Data Distribution for Input Stream of Lookup Join. Lookup Join is an important feature in Flink, It is typically used to enrich a table with data that is queried from an external system. If we interact with the external s