[jira] [Commented] (FLINK-33940) Update the auto-derivation rule of max parallelism for enlarged upscaling space

Rui Fan (Jira) Tue, 02 Jan 2024 18:53:05 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-33940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801992#comment-17801992
 ]


Rui Fan commented on FLINK-33940:
---------------------------------

Hi [~Zhanghao Chen] , IIUC flink-benchmarks[1] is checking the performance 
related to state, and we can see the performance change in this web UI[2].

Such as: the valueGet.HEAP[3][4] means valueState.get for hashmap state backend.

!image-2024-01-03-10-52-05-861.png|width=949,height=359!

 

[1] [https://github.com/apache/flink-benchmarks]


[2][http://flink-speed.xyz|http://flink-speed.xyz/]

[3][https://github.com/apache/flink-benchmarks/blob/master/src/main/java/org/apache/flink/state/benchmark/ValueStateBenchmark.java]

[4]http://flink-speed.xyz/timeline/?ben=valueGet.HEAP&env=3

> Update the auto-derivation rule of max parallelism for enlarged upscaling 
> space
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-33940
>                 URL: https://issues.apache.org/jira/browse/FLINK-33940
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / Core
>            Reporter: Zhanghao Chen
>            Assignee: Zhanghao Chen
>            Priority: Major
>         Attachments: image-2024-01-03-10-52-05-861.png
>
>
> *Background*
> The choice of the max parallelism of an stateful operator is important as it 
> limits the upper bound of the parallelism of the opeartor while it can also 
> add extra overhead when being set too large. Currently, the max parallelism 
> of an opeartor is either fixed to a value specified by API core / pipeline 
> option or auto-derived with the following rules:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * 1.5), 128), 32767)}}
> *Problem*
> Recently, the elasticity of Flink jobs is becoming more and more valued by 
> users. The current auto-derived max parallelism was introduced a time time 
> ago and only allows the operator parallelism to be roughly doubled, which is 
> not desired for elasticity. Setting an max parallelism manually may not be 
> desired as well: users may not have the sufficient expertise to select a good 
> max-parallelism value.
> *Proposal*
> Update the auto-derivation rule of max parallelism to derive larger max 
> parallelism for better elasticity experience out of the box. A candidate is 
> as follows:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), {*}1024{*}), 
> 32767)}}
> Looking forward to your opinions on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-33940) Update the auto-derivation rule of max parallelism for enlarged upscaling space

Reply via email to