[ 
https://issues.apache.org/jira/browse/FLINK-18312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136258#comment-17136258
 ] 

Yu Wang commented on FLINK-18312:
---------------------------------

I think there seems a issue in "AbstractAsynchronousOperationHandlers", in this 
handler, there is a local memory cache "completedOperationCache" to store the 
pending savpoint opeartion before redirect the request to the leader 
jobmanager, which seems not synced between all the jobmanagers. This makes only 
the jobmanager which receive the savepoint trigger requset can lookup the 
status of the savpoint, while the others can only return 404.

> SavepointStatusHandler and StaticFileServerHandler not redirect 
> ----------------------------------------------------------------
>
>                 Key: FLINK-18312
>                 URL: https://issues.apache.org/jira/browse/FLINK-18312
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / REST
>    Affects Versions: 1.8.0, 1.9.0, 1.10.0
>         Environment: 1. Deploy flink cluster in standlone mode on kubernetes 
> and use two Jobmanagers for HA.
> 2. Deploy a kubernetes service for the two jobmanagers to provide a unified 
> url.
>            Reporter: Yu Wang
>            Priority: Major
>
> Savepoint:
> 1. Deploy our flink cluster in standlone mode on kubernetes and use two 
> Jobmanagers for HA.
> 2. Deploy a kubernetes service for the two jobmanagers to provide a unified 
> url.
> 3. Send a savepoint trigger request to the leader Jobmanager.
> 4. Query the savepoint status from leader Jobmanager, get correct response.
> 5. Query the savepoint status from standby Jobmanager, the response will be 
> 404.
> Jobmanager log:
> 1. Query log from leader Jobmanager, get leader log.
> 2. Query log from standby Jobmanager, get standby log.
>  
> Both these two requests will be redirect to the leader in 1.7.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to