This is great info. Looks like it uses murmur hash below the surface too
[1].

Thanks!

[1]
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/KeyGroupRangeAssignment.java#L76

On Mon, Jan 18, 2021 at 1:38 AM Timo Walther <twal...@apache.org> wrote:

> Hi Rex,
>
> for questions like this, I would recommend to checkout the source code
> as well.
>
> Search for subclasses of `StreamPartitioner`. For example, for keyBy
> Flink uses:
>
>
> https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/partitioner/KeyGroupStreamPartitioner.java
>
> which uses
>
>
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/KeyGroupRangeAssignment.java
>
>
> Flink tries to avoid redistribution. Basically redistribution only
> occurs when performing a GROUP BY or when having operators with
> different parallelism. For Table API and SQL, you can print the
> shuffling steps via `Table.explain()`. They are indicated with an
> `Exchange` operation
>
> I hope this helps.
>
> Regards,
> Timo
>
>
> On 16.01.21 19:45, Rex Fenley wrote:
> > Hello,
> >
> > I'm wondering what sort of algorithm flink uses to map an Integer ID to
> > a subtask when distributing data. Also, what operators from the TableAPI
> > cause data to be redistributed? I know Joins will, what about
> > Aggregates, Sources, Filters?
> >
> > Thanks!
> >
> > --
> >
> > Rex Fenley|Software Engineer - Mobile and Backend
> >
> >
> > Remind.com <https://www.remind.com/>| BLOG <http://blog.remind.com/> |
> > FOLLOW US <https://twitter.com/remindhq> | LIKE US
> > <https://www.facebook.com/remindhq>
> >
>
>

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Reply via email to