[ https://issues.apache.org/jira/browse/FLINK-32881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-32881: ----------------------------------- Labels: detach-savepoint pull-request-available (was: detach-savepoint) > Client supports making savepoints in detach mode > ------------------------------------------------ > > Key: FLINK-32881 > URL: https://issues.apache.org/jira/browse/FLINK-32881 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission, Runtime / Checkpointing > Affects Versions: 1.19.0 > Reporter: Renxiang Zhou > Assignee: Renxiang Zhou > Priority: Major > Labels: detach-savepoint, pull-request-available > Fix For: 1.19.0 > > Attachments: image-2023-08-16-17-14-34-740.png, > image-2023-08-16-17-14-44-212.png > > > When triggering a savepoint using the command-line tool, the client needs to > wait for the job to finish creating the savepoint before it can exit. For > jobs with large state, the savepoint creation process can be time-consuming, > leading to the following problems: > # Platform users may need to manage thousands of Flink tasks on a single > client machine. With the current savepoint triggering mode, all savepoint > creation threads on that machine have to wait for the job to finish the > snapshot, resulting in significant resource waste; > # If the savepoint producing time exceeds the client's timeout duration, the > client will throw a timeout exception and report that the triggering > savepoint process fails. Since different jobs have varying savepoint > durations, it is difficult to adjust the timeout parameter on the client side. > Therefore, we propose adding a detach mode to trigger savepoints on the > client side, just similar to the detach mode behavior when submitting jobs. > Here are some specific details: > # The savepoint UUID will be generated on the client side. After > successfully triggering the savepoint, the client immediately returns the > UUID information and exits. > # Add a "dump-pending-savepoints" API that allows the client to check > whether the triggered savepoint has been successfully created. > By implementing these changes, the client can detach from the savepoint > creation process, reducing resource waste, and providing a way to check the > status of savepoint creation. > !image-2023-08-16-17-14-34-740.png|width=2129,height=625!!image-2023-08-16-17-14-44-212.png|width=1530,height=445! -- This message was sent by Atlassian Jira (v8.20.10#820010)