No, there is no mechanism available for individually tuning the parallelism of the generated operators in a SQL job. Moreover, such fine-tuning is often counter-productive. In most cases you are better off simply setting the overall parallelism to whatever is needed by the busiest operator(s). Unnecessary changes in parallelism force additional network shuffles (unless done in concert with a keyBy), and create an uneven distribution of load, with some slots having more operators than others.
Regards, David On Thu, Mar 18, 2021 at 1:03 PM eef hhj <zzfu...@gmail.com> wrote: > Hi team, > > Currently the SQL generated operator has all the same parallelism by > default, and we faced a issue that the in the case of multiple join, the > operator at later stage faces larger computation so that the overall > pipeline is back-presured and it causes checkpoint > fail(expired) occasionaly. > > We want to know that if there is any way to customize the parallelism of > the SQL generated operators individually so that we can make their powers > match with their actual load to make operators' load evenly distributed. > > Except to customize the parallelism of the operators, is there any other > suggested way to solve the problem and best practices for such multiple > joins? Thank you in advance. > > -- > *Best regards,* > *- Kai* >