Re: Approaches to customize the parallelism in SQL generated operators

eef hhj Mon, 22 Mar 2021 19:15:20 -0700

Hi David,

Thank you for the response. We are facing a situation of cold start for our
application. In the cold start phase, it requires a lot of parallelism to
make the busiest operator not overwhelmed so that there will be no
backpresure and no checkpoint works as normal. The problem is that such
over provisioned parallelism is far more than the one required by the
normal traffic from the stream, which is quite a waste.


Currently, we're thinking to limit the read frequency from the
connector(Kafka) side. By limiting the throughput of each single
parallelism, so that the downstream operators can well handle the traffic
during cold start. Per our observation, it works, not sure if this is the
suggested way for that. Any other suggestion is appreciated.

Another direclty we want to explore is only to change parallelism of the
source consumer, but not the subsequent ones, any further concerns of this
approach?

*-- Best wishes*
*Kai*


On Sun, Mar 21, 2021 at 1:01 AM David Anderson <dander...@apache.org> wrote:

> No, there is no mechanism available for individually tuning the
> parallelism of the generated operators in a SQL job. Moreover, such
> fine-tuning is often counter-productive. In most cases you are better off
> simply setting the overall parallelism to whatever is needed by the busiest
> operator(s). Unnecessary changes in parallelism force additional network
> shuffles (unless done in concert with a keyBy), and create an uneven
> distribution of load, with some slots having more operators than others.
>
> Regards,
> David
>
> On Thu, Mar 18, 2021 at 1:03 PM eef hhj <zzfu...@gmail.com> wrote:
>
>> Hi  team,
>>
>> Currently the SQL generated operator has all the same parallelism by
>> default, and we faced a issue that the in the case of multiple join, the
>> operator at later stage faces larger computation so that the overall
>> pipeline is back-presured and it causes checkpoint
>> fail(expired) occasionaly.
>>
>> We want to know that if there is any way to customize the parallelism of
>> the SQL generated operators individually so that we can make their powers
>> match with their actual load to make operators' load evenly distributed.
>>
>> Except to customize the parallelism of the operators, is there any other
>> suggested way to solve the problem and best practices for such multiple
>> joins? Thank you in advance.
>>
>> --
>> *Best regards,*
>> *- Kai*
>>
>

-- 
*Best regards,*
*- Kai*

Re: Approaches to customize the parallelism in SQL generated operators

Reply via email to