Hi Rex,

for questions like this, I would recommend to checkout the source code as well.

Search for subclasses of `StreamPartitioner`. For example, for keyBy Flink uses:

https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/partitioner/KeyGroupStreamPartitioner.java

which uses

https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/KeyGroupRangeAssignment.java


Flink tries to avoid redistribution. Basically redistribution only occurs when performing a GROUP BY or when having operators with different parallelism. For Table API and SQL, you can print the shuffling steps via `Table.explain()`. They are indicated with an `Exchange` operation

I hope this helps.

Regards,
Timo


On 16.01.21 19:45, Rex Fenley wrote:
Hello,

I'm wondering what sort of algorithm flink uses to map an Integer ID to a subtask when distributing data. Also, what operators from the TableAPI cause data to be redistributed? I know Joins will, what about Aggregates, Sources, Filters?

Thanks!

--

Rex Fenley|Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/>| BLOG <http://blog.remind.com/> | FOLLOW US <https://twitter.com/remindhq> | LIKE US <https://www.facebook.com/remindhq>


Reply via email to