Re: Approaches to customize the parallelism in SQL generated operators

David Anderson Sat, 20 Mar 2021 10:01:10 -0700

No, there is no mechanism available for individually tuning the parallelism
of the generated operators in a SQL job. Moreover, such fine-tuning is
often counter-productive. In most cases you are better off simply setting
the overall parallelism to whatever is needed by the busiest operator(s).
Unnecessary changes in parallelism force additional network shuffles
(unless done in concert with a keyBy), and create an uneven distribution of
load, with some slots having more operators than others.


Regards,
David

On Thu, Mar 18, 2021 at 1:03 PM eef hhj <zzfu...@gmail.com> wrote:

> Hi  team,
>
> Currently the SQL generated operator has all the same parallelism by
> default, and we faced a issue that the in the case of multiple join, the
> operator at later stage faces larger computation so that the overall
> pipeline is back-presured and it causes checkpoint
> fail(expired) occasionaly.
>
> We want to know that if there is any way to customize the parallelism of
> the SQL generated operators individually so that we can make their powers
> match with their actual load to make operators' load evenly distributed.
>
> Except to customize the parallelism of the operators, is there any other
> suggested way to solve the problem and best practices for such multiple
> joins? Thank you in advance.
>
> --
> *Best regards,*
> *- Kai*
>

Re: Approaches to customize the parallelism in SQL generated operators

Reply via email to