If it happens it happens immediately. Once we receive the triggerId from /jobs/:jobid/stop or /jobs/:jobid/savepoints we poll /jobs/:jobid/savepoints/:triggerid every second until the status is no longer IN_PROGRESS.
Peter Westermann Analytics Software Architect [[email protected]] [email protected]<mailto:[email protected]> [[email protected]] [[email protected]]<http://www.genesys.com/> From: Chesnay Schepler <[email protected]> Date: Thursday, June 16, 2022 at 10:55 AM To: Peter Westermann <[email protected]>, [email protected] <[email protected]> Subject: Re: Sporadic issues with savepoint status lookup in Flink 1.15 EXTERNAL EMAIL - Please use caution with links and attachments ________________________________ There is an expected case where this might happen: if too much time has elapsed since the savepoint was completed (default 5 minutes; controlled by rest.async.store-duration) Did this happen earlier than that? On 16/06/2022 15:53, Peter Westermann wrote: We recently upgraded one of our Flink clusters to version 1.15.0 and are now seeing sporadic issues when stopping a job with a savepoint via the REST API. This happens for /jobs/:jobid/savepoints and /jobs/:jobid/stop: The job finishes with a savepoint but the triggerId returned from the REST API seems to be invalid. Any lookups via /jobs/:jobid/savepoints/:triggerid fail with a 404 and the following error: org.apache.flink.runtime.rest.handler.RestHandlerException: There is no savepoint operation with triggerId=cee5054245598efb42245b3046a6ae75 for job 0995a9461f0178294ea71c9accbe750c Peter Westermann Analytics Software Architect [[email protected]] [email protected]<mailto:[email protected]> [[email protected]] [[email protected]]<http://www.genesys.com/>
