Hi Rex,
for questions like this, I would recommend to checkout the source code
as well.
Search for subclasses of `StreamPartitioner`. For example, for keyBy
Flink uses:
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/partitioner/KeyGroupStreamPartitioner.java
which uses
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/KeyGroupRangeAssignment.java
Flink tries to avoid redistribution. Basically redistribution only
occurs when performing a GROUP BY or when having operators with
different parallelism. For Table API and SQL, you can print the
shuffling steps via `Table.explain()`. They are indicated with an
`Exchange` operation
I hope this helps.
Regards,
Timo
On 16.01.21 19:45, Rex Fenley wrote:
Hello,
I'm wondering what sort of algorithm flink uses to map an Integer ID to
a subtask when distributing data. Also, what operators from the TableAPI
cause data to be redistributed? I know Joins will, what about
Aggregates, Sources, Filters?
Thanks!
--
Rex Fenley|Software Engineer - Mobile and Backend
Remind.com <https://www.remind.com/>| BLOG <http://blog.remind.com/> |
FOLLOW US <https://twitter.com/remindhq> | LIKE US
<https://www.facebook.com/remindhq>