[ https://issues.apache.org/jira/browse/FLINK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823433#comment-17823433 ]
Gyula Fora edited comment on FLINK-34566 at 3/5/24 6:02 AM: ------------------------------------------------------------ Thanks for the detailed explanation you are completely right, I missed this part. Sounds actually like a bug to me in JOSDK. Can you open a PR to fix it on our part by replacing it with the fixed thread pool? (Or overriding the min parallelism as well to the max value ) was (Author: gyfora): Thanks for the detailed explanation, I missed this part. Sounds actually like a bug to me in JOSDK. Can you open a PR to fix it on our part by replacing it with the fixed thread pool? (Or overriding the min parallelism as well to the max value ) > Flink Kubernetes Operator reconciliation parallelism setting not work > --------------------------------------------------------------------- > > Key: FLINK-34566 > URL: https://issues.apache.org/jira/browse/FLINK-34566 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator > Affects Versions: kubernetes-operator-1.7.0 > Reporter: Fei Feng > Priority: Blocker > Attachments: image-2024-03-04-10-58-37-679.png, > image-2024-03-04-11-17-22-877.png, image-2024-03-04-11-31-44-451.png > > > After we upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , > we can not enlarge reconciliation parallelism , and the maximum > reconciliation parallelism was only 10. This results FlinkDeployment and > SessionJob 's reconciliation delay about 10-30 seconds when we have a large > scale flink session cluster and session jobs in k8s cluster。 > > After investigating and validating, I found the reason is the logic for > reconciliation thread pool creation in JOSDK has changed significantly > between this two version. > v4.3.0: > reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize > was same as corePoolSize), so we pass the reconciliation thread and get a > thread pool that matches our expectations. > !image-2024-03-04-10-58-37-679.png|width=497,height=91! > [https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198] > > but in v4.2.0: > the reconciliation thread pool was created as a customer executor which we > can pass corePoolSize and maximumPoolSize to create this thread pool.The > problem is that we only set the maximumPoolSize of the thread pool, while, > the corePoolSize of the thread pool is defaulted to 10. This causes thread > pool size was only 10 and majority of events would be placed in the workQueue > for a while. > !image-2024-03-04-11-17-22-877.png|width=569,height=112! > [https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37] > > the solution is also simple, we can create and pass thread pool in flink > kubernetes operator so that we can control the reconciliation thread pool > directly, such as: > !image-2024-03-04-11-31-44-451.png|width=483,height=98! -- This message was sent by Atlassian Jira (v8.20.10#820010)