[ 
https://issues.apache.org/jira/browse/FLINK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823433#comment-17823433
 ] 

Gyula Fora edited comment on FLINK-34566 at 3/5/24 6:02 AM:
------------------------------------------------------------

Thanks for the detailed explanation you are completely right, I missed this 
part. Sounds actually like a bug to me in JOSDK. Can you open a PR to fix it on 
our part by replacing it with the fixed thread pool? (Or overriding the min 
parallelism as well to the max value )


was (Author: gyfora):
Thanks for the detailed explanation, I missed this part. Sounds actually like a 
bug to me in JOSDK. Can you open a PR to fix it on our part by replacing it 
with the fixed thread pool? (Or overriding the min parallelism as well to the 
max value )

> Flink Kubernetes Operator reconciliation parallelism setting not work
> ---------------------------------------------------------------------
>
>                 Key: FLINK-34566
>                 URL: https://issues.apache.org/jira/browse/FLINK-34566
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.7.0
>            Reporter: Fei Feng
>            Priority: Blocker
>         Attachments: image-2024-03-04-10-58-37-679.png, 
> image-2024-03-04-11-17-22-877.png, image-2024-03-04-11-31-44-451.png
>
>
> After we upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , 
> we can not enlarge reconciliation parallelism , and the maximum 
> reconciliation parallelism was only 10. This results FlinkDeployment and 
> SessionJob 's reconciliation delay about 10-30 seconds when we have a large 
> scale flink session cluster and session jobs in k8s cluster。
>  
> After investigating and validating, I found the reason is the logic for 
> reconciliation thread pool creation in JOSDK has changed significantly 
> between this two version. 
> v4.3.0: 
> reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
> was same as corePoolSize), so we pass the reconciliation thread and get a 
> thread pool that matches our expectations.
> !image-2024-03-04-10-58-37-679.png|width=497,height=91!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]
>  
> but in v4.2.0:
> the reconciliation thread pool was created as a customer executor which we 
> can pass corePoolSize and maximumPoolSize to create this thread pool.The 
> problem is that we only set the maximumPoolSize of the thread pool, while, 
> the corePoolSize of the thread pool is defaulted to 10. This causes thread 
> pool size was only 10 and majority of events would be placed in the workQueue 
> for a while.  
> !image-2024-03-04-11-17-22-877.png|width=569,height=112!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37]
>  
> the solution is also simple, we can create and pass thread pool in flink 
> kubernetes operator so that we can control the reconciliation thread pool 
> directly, such as:
> !image-2024-03-04-11-31-44-451.png|width=483,height=98!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to