ok that shouldn't happen. I couldn't find anything wrong in the code so far; will continue trying to reproduce it.

If this happens, does it persist indefinitely for a particular triggerId, or does it reappear later on again?
Are you only ever triggering a single savepoint for a given job?

Are you using session or application clusters?

On 16/06/2022 16:59, Peter Westermann wrote:

If it happens it happens immediately. Once we receive the triggerId from */jobs/:jobid/stop *or*/jobs/:jobid/savepoints* we poll */jobs/:jobid/savepoints/:triggerid *every second until the status is no longer IN_PROGRESS.

Peter Westermann

Analytics Software Architect

cidimage001.jpg@01D78D4C.C00AC080

peter.westerm...@genesys.com <mailto:peter.westerm...@genesys.com>

cidimage001.jpg@01D78D4C.C00AC080

cidimage002.jpg@01D78D4C.C00AC080 <http://www.genesys.com/>

*From: *Chesnay Schepler <ches...@apache.org>
*Date: *Thursday, June 16, 2022 at 10:55 AM
*To: *Peter Westermann <no.westerm...@genesys.com>, user@flink.apache.org <user@flink.apache.org>
*Subject: *Re: Sporadic issues with savepoint status lookup in Flink 1.15

* EXTERNAL EMAIL - Please use caution with links and attachments *

------------------------------------------------------------------------

There is an expected case where this might happen:

if too much time has elapsed since the savepoint was completed (default 5 minutes; controlled by rest.async.store-duration)

Did this happen earlier than that?

On 16/06/2022 15:53, Peter Westermann wrote:

    We recently upgraded one of our Flink clusters to version 1.15.0
    and are now seeing sporadic issues when stopping a job with a
    savepoint via the REST API. This happens for
    */jobs/:jobid/savepoints *and*/jobs/:jobid/stop*:

    The job finishes with a savepoint but the triggerId returned from
    the REST API seems to be invalid. Any lookups via
    */jobs/:jobid/savepoints/:triggerid* fail with a 404 and the
    following error:

    org.apache.flink.runtime.rest.handler.RestHandlerException: There
    is no savepoint operation with
    triggerId=cee5054245598efb42245b3046a6ae75 for job
    0995a9461f0178294ea71c9accbe750c

    Peter Westermann

    Analytics Software Architect

    cidimage001.jpg@01D78D4C.C00AC080

    peter.westerm...@genesys.com <mailto:peter.westerm...@genesys.com>

    cidimage001.jpg@01D78D4C.C00AC080

    cidimage002.jpg@01D78D4C.C00AC080 <http://www.genesys.com/>

Reply via email to