Re: Sporadic issues with savepoint status lookup in Flink 1.15

Peter Westermann Thu, 16 Jun 2022 08:00:06 -0700

If it happens it happens immediately. Once we receive the triggerId from 
/jobs/:jobid/stop or /jobs/:jobid/savepoints we poll 
/jobs/:jobid/savepoints/:triggerid every second until the status is no longer 
IN_PROGRESS.


Peter Westermann
Analytics Software Architect
[[email protected]]
[email protected]<mailto:[email protected]>
[[email protected]]
[[email protected]]<http://www.genesys.com/>


From: Chesnay Schepler <[email protected]>
Date: Thursday, June 16, 2022 at 10:55 AM
To: Peter Westermann <[email protected]>, [email protected] 
<[email protected]>
Subject: Re: Sporadic issues with savepoint status lookup in Flink 1.15
 EXTERNAL EMAIL - Please use caution with links and attachments

________________________________
There is an expected case where this might happen:
if too much time has elapsed since the savepoint was completed (default 5 
minutes; controlled by rest.async.store-duration)

Did this happen earlier than that?

On 16/06/2022 15:53, Peter Westermann wrote:
We recently upgraded one of our Flink clusters to version 1.15.0 and are now 
seeing sporadic issues when stopping a job with a savepoint via the REST API. 
This happens for /jobs/:jobid/savepoints and /jobs/:jobid/stop:
The job finishes with a savepoint but the triggerId returned from the REST API 
seems to be invalid. Any lookups via /jobs/:jobid/savepoints/:triggerid fail 
with a 404 and the following error:

org.apache.flink.runtime.rest.handler.RestHandlerException: There is no 
savepoint operation with triggerId=cee5054245598efb42245b3046a6ae75 for job 
0995a9461f0178294ea71c9accbe750c


Peter Westermann
Analytics Software Architect
[[email protected]]
[email protected]<mailto:[email protected]>
[[email protected]]
[[email protected]]<http://www.genesys.com/>

Re: Sporadic issues with savepoint status lookup in Flink 1.15

Reply via email to