Re: Flink ID hashing

Timo Walther Mon, 18 Jan 2021 01:38:37 -0800

Hi Rex,

for questions like this, I would recommend to checkout the source codeas well.

Search for subclasses of `StreamPartitioner`. For example, for keyByFlink uses:


https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/partitioner/KeyGroupStreamPartitioner.java

which uses

https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/KeyGroupRangeAssignment.java

Flink tries to avoid redistribution. Basically redistribution onlyoccurs when performing a GROUP BY or when having operators withdifferent parallelism. For Table API and SQL, you can print theshuffling steps via `Table.explain()`. They are indicated with an`Exchange` operation


I hope this helps.

Regards,
Timo


On 16.01.21 19:45, Rex Fenley wrote:

Hello,
I'm wondering what sort of algorithm flink uses to map an Integer ID toa subtask when distributing data. Also, what operators from the TableAPIcause data to be redistributed? I know Joins will, what aboutAggregates, Sources, Filters?
Thanks!

--

Rex Fenley|Software Engineer - Mobile and Backend
Remind.com <https://www.remind.com/>| BLOG <http://blog.remind.com/> |FOLLOW US <https://twitter.com/remindhq> | LIKE US<https://www.facebook.com/remindhq>

Re: Flink ID hashing

Reply via email to