[jira] [Commented] (FLINK-26370) Make Flink cluster communication asynchronous

Gyula Fora (Jira) Fri, 04 Mar 2022 00:11:21 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-26370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501211#comment-17501211
 ]


Gyula Fora commented on FLINK-26370:
------------------------------------

Getting the job status for a running cluster should be max a few seconds, 
without spec change (upgrade) the reconincile loop should not run very often 
(lets say once a minute) as the jobs are self recovering anyways. For spec 
changes you need to know the status as some operations cannot really be done if 
the job is not in a RUNNING state.

>From all of these I dont really see the point in caching the status, but maybe 
>I am missing something here.

I would rather increase the threadpool size and make it easily configurable. 
Savepoints are pretty easy to make fully async I think as I commented on the 
ticket, without risking an inconsistent deployment state. But in general things 
like suspend/upgrade whatnot can be much trickier to do in a fully async way.

> Make Flink cluster communication asynchronous
> ---------------------------------------------
>
>                 Key: FLINK-26370
>                 URL: https://issues.apache.org/jira/browse/FLINK-26370
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Kubernetes Operator
>            Reporter: Gyula Fora
>            Assignee: Sandor Kelemen
>            Priority: Major
>
> In the current architecture calls to the flink clusters (through the rest 
> client) are made synchronously from the reconcile loop. 
> These calls often take a long time due to various (compeltely normal) reasons:
>  - Cluster is not ready -> long call + timeoutexception
>  - Operation takes a long time -> cancel/savepoint operations are often 
> expected to take seconds/minutes
> Both the observer and reconciler components make these calls.
> We should come up with a way to avoid making these sync calls from the main 
> loop while still preserving the logic of the operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26370) Make Flink cluster communication asynchronous

Reply via email to