This is great info. Looks like it uses murmur hash below the surface too [1].
Thanks! [1] https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/KeyGroupRangeAssignment.java#L76 On Mon, Jan 18, 2021 at 1:38 AM Timo Walther <twal...@apache.org> wrote: > Hi Rex, > > for questions like this, I would recommend to checkout the source code > as well. > > Search for subclasses of `StreamPartitioner`. For example, for keyBy > Flink uses: > > > https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/partitioner/KeyGroupStreamPartitioner.java > > which uses > > > https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/KeyGroupRangeAssignment.java > > > Flink tries to avoid redistribution. Basically redistribution only > occurs when performing a GROUP BY or when having operators with > different parallelism. For Table API and SQL, you can print the > shuffling steps via `Table.explain()`. They are indicated with an > `Exchange` operation > > I hope this helps. > > Regards, > Timo > > > On 16.01.21 19:45, Rex Fenley wrote: > > Hello, > > > > I'm wondering what sort of algorithm flink uses to map an Integer ID to > > a subtask when distributing data. Also, what operators from the TableAPI > > cause data to be redistributed? I know Joins will, what about > > Aggregates, Sources, Filters? > > > > Thanks! > > > > -- > > > > Rex Fenley|Software Engineer - Mobile and Backend > > > > > > Remind.com <https://www.remind.com/>| BLOG <http://blog.remind.com/> | > > FOLLOW US <https://twitter.com/remindhq> | LIKE US > > <https://www.facebook.com/remindhq> > > > > -- Rex Fenley | Software Engineer - Mobile and Backend Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | FOLLOW US <https://twitter.com/remindhq> | LIKE US <https://www.facebook.com/remindhq>