Hi, Ipengdream. I will drive this work. We will support this functionality via hints, because "distribute by" is not in the sql standard. But it will be supported in hive dialect. I will post the FLIP doc recently.
Best, Godfrey Jark Wu <imj...@gmail.com> 于2022年5月9日周一 16:03写道: > > We will start a FLIP discussion in the dev mailing list, so please watch on > the ML. > I also find that you opened FLINK-27541, we will also update FLINK-27541 > once we have an initial FLIP. > > Best, > Jark > > On Mon, 9 May 2022 at 15:18, lpengdr...@163.com <lpengdr...@163.com> wrote: > > > Yeah! That's great. Thank you! Where can i get more information about > > that? > > > > > > > > lpengdr...@163.com > > > > 发件人: Jark Wu > > 发送时间: 2022-05-09 14:12 > > 收件人: dev > > 抄送: 贺小令 > > 主题: Re: Re: 【Could we support distribute by For FlinkSql】 > > I got what you want, maybe something like DISTRIBUTED BY in Hive SQL. > > The community is planning to support this feature but has not started yet. > > @Godfrey will drive this work. > > > > Best, > > Jark > > > > On Mon, 9 May 2022 at 13:45, lpengdr...@163.com <lpengdr...@163.com> > > wrote: > > > > > Hi > > > Thanks for your reply. > > > The way I want is not only for hash-lookup-join, there are manay > > > operators need a hash-operation to solve the skew-problem. Lookup-join > > > is a special scene. > > > So I hope there is a operator could make a shuffle. Maybe it's a way > > > to solve the problems ? > > > > > > > > > > > https://docs.google.com/document/d/1D7AX-_wttMNY53TxLQxiDaRyDVCeEZYCE8AwYflDXZM/edit?usp=sharing > > > > > > > > > > > > > > > > > > lpengdr...@163.com > > > > > > 发件人: Jark Wu > > > 发送时间: 2022-05-09 12:27 > > > 收件人: dev > > > 主题: Re: 【Could we support distribute by For FlinkSql】 > > > Hi, > > > > > > If you are looking for the hash lookup join, there is an in-progress > > > FLIP-204[1] working for it. > > > > > > Btw, I still can't see your picture. You can upload your picture to some > > > image service and share a link here. > > > > > > Best, > > > Jark > > > > > > [1]: > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join > > > > > > On Mon, 9 May 2022 at 11:22, lpengdr...@163.com <lpengdr...@163.com> > > > wrote: > > > > > > > Sorry! > > > > The destroied picture is the attachment ; > > > > > > > > ------------------------------ > > > > lpengdr...@163.com > > > > > > > > > > > > *发件人:* lpengdr...@163.com > > > > *发送时间:* 2022-05-09 11:16 > > > > *收件人:* user-zh <user...@flink.apache.org>; dev <dev@flink.apache.org> > > > > *主题:* 【Could we support distribute by For FlinkSql】 > > > > Hello: > > > > Now we cann't add a shuffle-operation in a sql-job. > > > > Sometimes , for example, I have a kafka-source(three partitions) with > > > > parallelism three. And then I have a lookup-join function, I want > > process > > > > the data distribute by id so that the data can split into thre > > > parallelism > > > > evenly (The source maybe slant seriously). > > > > In DataStream API i can do it with keyby(), but it's so sad that i can > > do > > > > nothing when i use a sql; > > > > Maybe we can do it like 'select id, f1,f2 from sourceTable distribute > > by > > > > id' like we do it in SparkSql. > > > > > > > > Sot that we can make change on the picture in sql-mode; > > > > > > > > > > > > > > > > ------------------------------ > > > > lpengdr...@163.com > > > > > > > > > > > > >