I think the idea of filtering is interesting but I do wonder if we
should introduce it as part of this FLIP.
That seems like something we'd maybe want to introduce consistently for
all checkpoint-related endpoints.
I'm also not sure about returning a 404 if no checkpoints exists
(especially with the filtering) but the job is there.
It's a bit annoying to handle on the client-side, especially since there
are other 404 causes, and it can spuriously happen despite no issue on
the client side (e.g., when the job is still initializing, or just
started, or the JM has restarted and lost the checkpoint history (I'm
not sure if the checkpoint we restore from is included in there).
As an alternative it could be either latest:{} or latest:{...checkpoint
info...}
The FLIP should also cover the error cases when it is called for jobs
that don't have checkpointing enabled (e.g., batch).
On 22/07/2025 06:35, Ahmed Hamdy wrote:
Hi Poorvank
yes the idea is to do the latest checkpoint Id lookup from the history and
use it to return the checkpoint details.
Possible to consider adding type (savepoint/checkpoint)
filtering: Since cache returns AbstractCheckpointStats
yeah that's a good idea, I believe it might be useful in some cases.
Best Regards
Ahmed Hamdy
On Fri, 18 Jul 2025 at 20:39, Poorvank Bhatia <puravbhat...@gmail.com>
wrote:
Hi Ahmed, Thank you for the FLIP.
+1 (non-binding) for this feature.
I have two implementation questions:
1. Approach for finding latest checkpoints: Since the FLIP
mentions "utilizing
existing CheckpointStatsCache,
<
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=373886441#FLIP536:AddlatestcheckpointdetailsendpointtoRestAPI-ImplementationDetails
"
but that cache only supports lookup by checkpoint ID (tryGet(long
checkpointId))
<
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/checkpoints/CheckpointStatsCache.java#L71
,
do you intend to use getLatestCompletedCheckpoint()
<
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointStatsHistory.java#L147
to
find the latest checkpoint, then cache it using
checkpointStatsCache.tryAdd(). Is this the intended approach, if not can
you clarify more.
2. Possible to consider adding type (savepoint/checkpoint)
filtering: Since cache returns AbstractCheckpointStats
<
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/AbstractCheckpointStats.java
which
has CheckpointProperties
<
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointProperties.java
that
can distinguish between regular checkpoints and savepoints, would it be
valuable to extend the endpoint to support type filtering? i.e
GET
/jobs/:jobid/checkpoints/details/latest?status=COMPLETED&type=SAVEPOINT
On Fri, Jul 18, 2025 at 9:48 PM Ahmed Hamdy <hamdy10...@gmail.com> wrote:
Hi David,
Thanks for the feedback, I guess an alternative approach would be adding
paging and sorting to the checkpointing stats query, however this will
still require 2 REST api calls to get the latest checkpoint details as
the
stats endpoint only gives a summary not the details, I am open to adding
another query parameter to the endpoint in the FLIP to get latest X
checkpoint details in one go but I honestly didn't see much of a use case
to have more than one and might complicate how we wanna handle having y
available checkpoints where 0 < y < X.
Let me know your thoughts as well as the rest of the community.
Best Regards
Ahmed Hamdy
On Fri, 18 Jul 2025 at 16:47, David Radley <david_rad...@uk.ibm.com>
wrote:
Hi Ahmed,
Thanks for submitting this Flip.
What do you think of having /jobs/:jobid/checkpoints with query params
to
specify sorted criteria and direction and the number of returned
elements
(page size). This would appear to be more of a standard (and flexible)
way
of doing a search. To get the latest you would specify a page size of 1
with a time sort criteria and descending direction.
WDYT?
Warm regards, David.
From: Ahmed Hamdy <hamdy10...@gmail.com>
Date: Friday, 18 July 2025 at 15:48
To: dev@flink.apache.org <dev@flink.apache.org>
Subject: [EXTERNAL] [DISCUSS][FLIP-536] Add latest checkpoint details
endpoint to Rest API
Hi Devs,
I would like to start a discussion on FLIP-536[1] for adding a "latest"
checkpoint details endpoint to Flink's REST Api. This is a common case
I
have personally encountered when integrating components with Flink
using
the Rest API.
Let me know your thoughts.
1-
https://cwiki.apache.org/confluence/display/FLINK/FLIP-536%3A+Add+latest+checkpoint+details+endpoint+to+Rest+API
Best Regards
Ahmed Hamdy
Unless otherwise stated above:
IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: Building C, IBM Hursley Office, Hursley Park Road,
Winchester, Hampshire SO21 2JN