I think the idea of filtering is interesting but I do wonder if we should introduce it as part of this FLIP. That seems like something we'd maybe want to introduce consistently for all checkpoint-related endpoints.

I'm also not sure about returning a 404 if no checkpoints exists (especially with the filtering) but the job is there. It's a bit annoying to handle on the client-side, especially since there are other 404 causes, and it can spuriously happen despite no issue on the client side (e.g., when the job is still initializing, or just started, or the JM has restarted and lost the checkpoint history (I'm not sure if the checkpoint we restore from is included in there). As an alternative it could be either latest:{} or latest:{...checkpoint info...}

The FLIP should also cover the error cases when it is called for jobs that don't have checkpointing enabled (e.g., batch).

On 22/07/2025 06:35, Ahmed Hamdy wrote:
Hi Poorvank
yes the idea is to do the latest checkpoint Id lookup from the history and
use it to return the checkpoint details.

Possible to consider adding type (savepoint/checkpoint)
    filtering: Since cache returns AbstractCheckpointStats

yeah that's a good idea, I believe it might be useful in some cases.
Best Regards
Ahmed Hamdy


On Fri, 18 Jul 2025 at 20:39, Poorvank Bhatia <puravbhat...@gmail.com>
wrote:

Hi Ahmed,  Thank you for the FLIP.
+1 (non-binding) for this feature.

I have two implementation questions:

    1. Approach for finding latest checkpoints:  Since the FLIP
mentions "utilizing
    existing CheckpointStatsCache,
    <
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=373886441#FLIP536:AddlatestcheckpointdetailsendpointtoRestAPI-ImplementationDetails
"
    but that cache only supports lookup by checkpoint ID (tryGet(long
    checkpointId))
    <
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/checkpoints/CheckpointStatsCache.java#L71
,
    do you intend to use getLatestCompletedCheckpoint()
    <
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointStatsHistory.java#L147
to
    find the latest checkpoint, then cache it using
    checkpointStatsCache.tryAdd(). Is this the intended approach, if not can
    you clarify more.
    2. Possible to consider adding type (savepoint/checkpoint)
    filtering: Since cache returns AbstractCheckpointStats
    <
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/AbstractCheckpointStats.java
which
    has CheckpointProperties
    <
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointProperties.java
that
    can distinguish between regular checkpoints and savepoints,  would it be
    valuable to extend the endpoint to support type filtering? i.e

          GET
/jobs/:jobid/checkpoints/details/latest?status=COMPLETED&type=SAVEPOINT

On Fri, Jul 18, 2025 at 9:48 PM Ahmed Hamdy <hamdy10...@gmail.com> wrote:

Hi David,
Thanks for the feedback, I guess an alternative approach would be adding
paging and sorting to the checkpointing stats query, however this will
still require 2 REST api calls to get the latest checkpoint details as
the
stats endpoint only gives a summary not the details, I am open to adding
another query parameter to the endpoint in the FLIP to get latest X
checkpoint details in one go but I honestly didn't see much of a use case
to have more than one and might complicate how we wanna handle having y
available checkpoints where 0 < y < X.
Let me know your thoughts as well as the rest of the community.


Best Regards
Ahmed Hamdy


On Fri, 18 Jul 2025 at 16:47, David Radley <david_rad...@uk.ibm.com>
wrote:

Hi Ahmed,
Thanks for submitting this Flip.
What do you think of having /jobs/:jobid/checkpoints with query params
to
specify sorted criteria and direction and the number of returned
elements
(page size). This would appear to be more of a standard (and flexible)
way
of doing a search. To get the latest you would specify a page size of 1
with a time sort criteria and descending direction.
  WDYT?
       Warm regards, David.


From: Ahmed Hamdy <hamdy10...@gmail.com>
Date: Friday, 18 July 2025 at 15:48
To: dev@flink.apache.org <dev@flink.apache.org>
Subject: [EXTERNAL] [DISCUSS][FLIP-536] Add latest checkpoint details
endpoint to Rest API
Hi Devs,
I would like to start a discussion on FLIP-536[1] for adding a "latest"
checkpoint details endpoint to Flink's REST Api. This is a common case
I
have personally encountered when integrating components with Flink
using
the Rest API.
Let me know your thoughts.


1-


https://cwiki.apache.org/confluence/display/FLINK/FLIP-536%3A+Add+latest+checkpoint+details+endpoint+to+Rest+API
Best Regards
Ahmed Hamdy

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: Building C, IBM Hursley Office, Hursley Park Road,
Winchester, Hampshire SO21 2JN


Reply via email to