Re: Handling skewness and Heterogeniety

Anis Nasir Wed, 15 Feb 2017 02:44:45 -0800

Dear Fabian,

Can you have a look into this issue. What actions will be required to
resolve this one?


https://issues.apache.org/jira/browse/FLINK-1725

Regards,
Anis



On Wed, Feb 15, 2017 at 6:36 PM, Fabian Hueske <[email protected]> wrote:

> Hi Anis,
>
> Flink uses regular hash-partitioning to shuffle records and does not have a
> mechanism to counter data skew (other than scaling out).
> Heterogeneous hardware can (to some extend) be addressed by adapting the
> number of processing slots (or task managers) per machine, i.e., configure
> fewer slots on machines with lower performance.
>
> Best, Fabian
>
> 2017-02-15 2:12 GMT+01:00 Anis Nasir <[email protected]>:
>
> > Dear All,
> >
> > I have few use cases for Flink streaming where the cluster consist of
> > heterogenous machines.
> >
> > Additionally, there is skew present in both the input distribution (e.g.,
> > each tuple is drawn from a zipf distribution) and the service time (e.g.,
> > service time required for each tuple comes from a zipf distribution).
> >
> > I want to know who Flink will handle such use cases assuming that the
> > distribution of both workload and cluster is unknown in prior.
> >
> > Any help will be highly appreciated!
> >
> >
> > Regards,
> > Anis
> >
>

Re: Handling skewness and Heterogeniety

Reply via email to