Re: Handling skewness and Heterogeniety

2017-02-15 Thread Anis Nasir
Dear Fabian, Can you have a look into this issue. What actions will be required to resolve this one? https://issues.apache.org/jira/browse/FLINK-1725 Regards, Anis On Wed, Feb 15, 2017 at 6:36 PM, Fabian Hueske wrote: > Hi Anis, > > Flink uses regular hash-partitioning to shuffle records an

Re: Handling skewness and Heterogeniety

2017-02-15 Thread Fabian Hueske
Hi Anis, Flink uses regular hash-partitioning to shuffle records and does not have a mechanism to counter data skew (other than scaling out). Heterogeneous hardware can (to some extend) be addressed by adapting the number of processing slots (or task managers) per machine, i.e., configure fewer sl

Handling skewness and Heterogeniety

2017-02-14 Thread Anis Nasir
Dear All, I have few use cases for Flink streaming where the cluster consist of heterogenous machines. Additionally, there is skew present in both the input distribution (e.g., each tuple is drawn from a zipf distribution) and the service time (e.g., service time required for each tuple comes fro