[jira] (FLINK-34566) Flink Kubernetes Operator reconciliation parallelism setting not work

Gyula Fora (Jira) Mon, 04 Mar 2024 04:46:05 -0800


    [ https://issues.apache.org/jira/browse/FLINK-34566 ]



    Gyula Fora deleted comment on FLINK-34566:
    ------------------------------------

was (Author: gyfora):
{noformat}
A ThreadPoolExecutor will automatically adjust the pool size (see getPoolSize) 
according to the bounds set by corePoolSize (see getCorePoolSize) and 
maximumPoolSize (see getMaximumPoolSize). When a new task is submitted in 
method execute(Runnable), if fewer than corePoolSize threads are running, a new 
thread is created to handle the request, even if other worker threads are idle. 
Else if fewer than maximumPoolSize threads are running, a new thread will be 
created to handle the request only if the queue is full. By setting 
corePoolSize and maximumPoolSize the same, you create a fixed-size thread pool. 
By setting maximumPoolSize to an essentially unbounded value such as 
Integer.MAX_VALUE, you allow the pool to accommodate an arbitrary number of 
concurrent tasks. Most typically, core and maximum pool sizes are set only upon 
construction, but they may also be changed dynamically using setCorePoolSize 
and setMaximumPoolSize.{noformat}

> Flink Kubernetes Operator reconciliation parallelism setting not work
> ---------------------------------------------------------------------
>
>                 Key: FLINK-34566
>                 URL: https://issues.apache.org/jira/browse/FLINK-34566
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.7.0
>            Reporter: Fei Feng
>            Priority: Blocker
>         Attachments: image-2024-03-04-10-58-37-679.png, 
> image-2024-03-04-11-17-22-877.png, image-2024-03-04-11-31-44-451.png
>
>
> After we upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , 
> we can not enlarge reconciliation parallelism , and the maximum 
> reconciliation parallelism was only 10. This results FlinkDeployment and 
> SessionJob 's reconciliation delay about 10-30 seconds when we have a large 
> scale flink session cluster and session jobs in k8s cluster。
>  
> After investigating and validating, I found the reason is the logic for 
> reconciliation thread pool creation in JOSDK has changed significantly 
> between this two version. 
> v4.3.0: 
> reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
> was same as corePoolSize), so we pass the reconciliation thread and get a 
> thread pool that matches our expectations.
> !image-2024-03-04-10-58-37-679.png|width=497,height=91!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]
>  
> but in v4.2.0:
> the reconciliation thread pool was created as a customer executor which we 
> can pass corePoolSize and maximumPoolSize to create this thread pool.The 
> problem is that we only set the maximumPoolSize of the thread pool, while, 
> the corePoolSize of the thread pool is defaulted to 10. This causes thread 
> pool size was only 10 and majority of events would be placed in the workQueue 
> for a while.  
> !image-2024-03-04-11-17-22-877.png|width=569,height=112!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37]
>  
> the solution is also simple, we can create and pass thread pool in flink 
> kubernetes operator so that we can control the reconciliation thread pool 
> directly, such as:
> !image-2024-03-04-11-31-44-451.png|width=483,height=98!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] (FLINK-34566) Flink Kubernetes Operator reconciliation parallelism setting not work

Reply via email to