Hi experts,

Setting the maximum parallelism to a very large value can be detrimental to
> performance because some state backends have to keep internal data
> structures that scale with the number of key-groups (which are the internal
> implementation mechanism for rescalable state).
>
> Changing the maximum parallelism explicitly when recovery from original
> job will lead to state incompatibility.
>

I read the section above from Flink official document [1], and I'm
wondering what the detail is regarding to the side-effect.

Suppose that I have a Flink SQL job with large state, large parallelism and
using RocksDB as my state backend.
I would like to set the max parallelism as 32768, so that I don't bother if
the max parallelism can be divided by the parallelism whenever I want to
scale my job,
because the number of key groups will not differ too much between each
subtask.

I'm wondering if this is a good practice, because based on the
official document it is not recommended actually.
If possible, I would like to know the detail about this side-effect. Which
state backend will have this issue? and Why?
Please give me an advice. Thanks in advance.

Best regards,
Tony Wei

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism

Reply via email to