Hi, Tony.
"be detrimental to performance" means that some extra space overhead of the
field of the key-group may influence performance.
As we know, Flink will write the key group as the prefix of the key to
speed up rescaling.
So the format will be like: key group | key len | key | ......
You could check the relationship between max parallelism and bytes of key
group as below:
------------------------------------------
max parallelism   bytes of key group
       128                    1
      32768                 2
------------------------------------------
So I think the cost will be very small if the real key length >> 2 bytes.

On Wed, Mar 8, 2023 at 1:06 PM Tony Wei <tony19920...@gmail.com> wrote:

> Hi experts,
>
> Setting the maximum parallelism to a very large value can be detrimental
>> to performance because some state backends have to keep internal data
>> structures that scale with the number of key-groups (which are the internal
>> implementation mechanism for rescalable state).
>>
>> Changing the maximum parallelism explicitly when recovery from original
>> job will lead to state incompatibility.
>>
>
> I read the section above from Flink official document [1], and I'm
> wondering what the detail is regarding to the side-effect.
>
> Suppose that I have a Flink SQL job with large state, large parallelism
> and using RocksDB as my state backend.
> I would like to set the max parallelism as 32768, so that I don't bother
> if the max parallelism can be divided by the parallelism whenever I want to
> scale my job,
> because the number of key groups will not differ too much between each
> subtask.
>
> I'm wondering if this is a good practice, because based on the
> official document it is not recommended actually.
> If possible, I would like to know the detail about this side-effect. Which
> state backend will have this issue? and Why?
> Please give me an advice. Thanks in advance.
>
> Best regards,
> Tony Wei
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism
>


-- 
Best,
Hangxiang.

Reply via email to