I believe there is some noticeable overhead if you are using the
heap-based state backend, but with RocksDB I think the difference is
negligible.

David

On Tue, Mar 7, 2023 at 11:10 PM Hangxiang Yu <master...@gmail.com> wrote:
>
> Hi, Tony.
> "be detrimental to performance" means that some extra space overhead of the 
> field of the key-group may influence performance.
> As we know, Flink will write the key group as the prefix of the key to speed 
> up rescaling.
> So the format will be like: key group | key len | key | ......
> You could check the relationship between max parallelism and bytes of key 
> group as below:
> ------------------------------------------
> max parallelism   bytes of key group
>        128                    1
>       32768                 2
> ------------------------------------------
> So I think the cost will be very small if the real key length >> 2 bytes.
>
> On Wed, Mar 8, 2023 at 1:06 PM Tony Wei <tony19920...@gmail.com> wrote:
>>
>> Hi experts,
>>
>>> Setting the maximum parallelism to a very large value can be detrimental to 
>>> performance because some state backends have to keep internal data 
>>> structures that scale with the number of key-groups (which are the internal 
>>> implementation mechanism for rescalable state).
>>>
>>> Changing the maximum parallelism explicitly when recovery from original job 
>>> will lead to state incompatibility.
>>
>>
>> I read the section above from Flink official document [1], and I'm wondering 
>> what the detail is regarding to the side-effect.
>>
>> Suppose that I have a Flink SQL job with large state, large parallelism and 
>> using RocksDB as my state backend.
>> I would like to set the max parallelism as 32768, so that I don't bother if 
>> the max parallelism can be divided by the parallelism whenever I want to 
>> scale my job,
>> because the number of key groups will not differ too much between each 
>> subtask.
>>
>> I'm wondering if this is a good practice, because based on the official 
>> document it is not recommended actually.
>> If possible, I would like to know the detail about this side-effect. Which 
>> state backend will have this issue? and Why?
>> Please give me an advice. Thanks in advance.
>>
>> Best regards,
>> Tony Wei
>>
>> [1] 
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism
>
>
>
> --
> Best,
> Hangxiang.

Reply via email to