Hi
Thanks for your reply.
The way I want is not only for hash-lookup-join, there are manay
operators need a hash-operation to solve the skew-problem. Lookup-join is a
special scene.
So I hope there is a operator could make a shuffle. Maybe it's a way to
solve the problems ?
https://docs.google.com/document/d/1D7AX-_wttMNY53TxLQxiDaRyDVCeEZYCE8AwYflDXZM/edit?usp=sharing
[email protected]
发件人: Jark Wu
发送时间: 2022-05-09 12:27
收件人: dev
主题: Re: 【Could we support distribute by For FlinkSql】
Hi,
If you are looking for the hash lookup join, there is an in-progress
FLIP-204[1] working for it.
Btw, I still can't see your picture. You can upload your picture to some
image service and share a link here.
Best,
Jark
[1]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
On Mon, 9 May 2022 at 11:22, [email protected] <[email protected]> wrote:
> Sorry!
> The destroied picture is the attachment ;
>
> ------------------------------
> [email protected]
>
>
> *发件人:* [email protected]
> *发送时间:* 2022-05-09 11:16
> *收件人:* user-zh <[email protected]>; dev <[email protected]>
> *主题:* 【Could we support distribute by For FlinkSql】
> Hello:
> Now we cann't add a shuffle-operation in a sql-job.
> Sometimes , for example, I have a kafka-source(three partitions) with
> parallelism three. And then I have a lookup-join function, I want process
> the data distribute by id so that the data can split into thre parallelism
> evenly (The source maybe slant seriously).
> In DataStream API i can do it with keyby(), but it's so sad that i can do
> nothing when i use a sql;
> Maybe we can do it like 'select id, f1,f2 from sourceTable distribute by
> id' like we do it in SparkSql.
>
> Sot that we can make change on the picture in sql-mode;
>
>
>
> ------------------------------
> [email protected]
>
>