Re: Re: 【Could we support distribute by For FlinkSql】

godfrey he Tue, 10 May 2022 02:34:20 -0700

Hi, Ipengdream. I will drive this work.
We will support this functionality via hints,
because "distribute by" is not in the sql standard.
But it will be supported in hive dialect.
I will post the FLIP doc recently.


Best,
Godfrey


Jark Wu <imj...@gmail.com> 于2022年5月9日周一 16:03写道：

>
> We will start a FLIP discussion in the dev mailing list, so please watch on
> the ML.
> I also find that you opened FLINK-27541, we will also update FLINK-27541
> once we have an initial FLIP.
>
> Best,
> Jark
>
> On Mon, 9 May 2022 at 15:18, lpengdr...@163.com <lpengdr...@163.com> wrote:
>
> > Yeah!  That's great. Thank you!   Where can i get more information about
> > that?
> >
> >
> >
> > lpengdr...@163.com
> >
> > 发件人： Jark Wu
> > 发送时间： 2022-05-09 14:12
> > 收件人： dev
> > 抄送： 贺小令
> > 主题： Re: Re: 【Could we support distribute by For FlinkSql】
> > I got what you want, maybe something like DISTRIBUTED BY in Hive SQL.
> > The community is planning to support this feature but has not started yet.
> > @Godfrey will drive this work.
> >
> > Best,
> > Jark
> >
> > On Mon, 9 May 2022 at 13:45, lpengdr...@163.com <lpengdr...@163.com>
> > wrote:
> >
> > > Hi
> > >     Thanks for your reply.
> > >     The way I want is not only for hash-lookup-join,   there are manay
> > > operators  need  a hash-operation to solve the skew-problem.  Lookup-join
> > > is a special scene.
> > >     So I hope there is a operator could make a shuffle. Maybe it's a way
> > > to solve the problems ?
> > >
> > >
> > >
> > https://docs.google.com/document/d/1D7AX-_wttMNY53TxLQxiDaRyDVCeEZYCE8AwYflDXZM/edit?usp=sharing
> > >
> > >
> > >
> > >
> > >
> > > lpengdr...@163.com
> > >
> > > 发件人： Jark Wu
> > > 发送时间： 2022-05-09 12:27
> > > 收件人： dev
> > > 主题： Re: 【Could we support distribute by For FlinkSql】
> > > Hi,
> > >
> > > If you are looking for the hash lookup join, there is an in-progress
> > > FLIP-204[1] working for it.
> > >
> > > Btw, I still can't see your picture. You can upload your picture to some
> > > image service and share a link here.
> > >
> > > Best,
> > > Jark
> > >
> > > [1]:
> > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
> > >
> > > On Mon, 9 May 2022 at 11:22, lpengdr...@163.com <lpengdr...@163.com>
> > > wrote:
> > >
> > > > Sorry!
> > > > The destroied picture is the attachment ;
> > > >
> > > > ------------------------------
> > > > lpengdr...@163.com
> > > >
> > > >
> > > > *发件人：* lpengdr...@163.com
> > > > *发送时间：* 2022-05-09 11:16
> > > > *收件人：* user-zh <user...@flink.apache.org>; dev <dev@flink.apache.org>
> > > > *主题：* 【Could we support distribute by For FlinkSql】
> > > > Hello：
> > > >     Now we cann't add a shuffle-operation in a sql-job.
> > > > Sometimes , for example, I have a kafka-source(three partitions) with
> > > > parallelism three. And then I have a lookup-join function, I want
> > process
> > > > the data distribute by id so that the data can split into thre
> > > parallelism
> > > > evenly (The source maybe slant seriously).
> > > > In DataStream API i can do it with keyby(), but it's so sad that i can
> > do
> > > > nothing when i use a sql;
> > > > Maybe we can do it like 'select id, f1,f2 from sourceTable distribute
> > by
> > > > id' like we do it in SparkSql.
> > > >
> > > > Sot that we can make change on the picture  in sql-mode;
> > > >
> > > >
> > > >
> > > > ------------------------------
> > > > lpengdr...@163.com
> > > >
> > > >
> > >
> >

Re: Re: 【Could we support distribute by For FlinkSql】

Reply via email to