Thanks for the flip. It is useful for users. I have only one question: JM 
Memory Pressure Under High-Concurrency Sampling — Could It Cause OOM in 
Large-Scale Jobs?

> 2026年3月24日 16:29,Jiangang Liu <[email protected]> 写道:
> 
> Hi everyone,
> 
> I would like to start a discussion on FLIP-571: Support Dynamically
> Updating Checkpoint Configuration at Runtime via REST API [1].
> 
> Currently, checkpoint configuration (checkpointInterval, checkpointTimeout)
> is immutable after job submission. This creates significant operational
> challenges for long-running streaming jobs:
> 
>   1. Cascading checkpoint failures cannot be resolved without restarting
>   the
>   job, causing data reprocessing delays.
>   2. Near-complete checkpoints (e.g., 95% persisted) are entirely discarded
>   on timeout — wasting all I/O work and potentially creating a failure
>   loop for large-state jobs.
>   3. Static configuration cannot adapt to variable workloads at runtime.
> 
> FLIP-571 proposes a new REST API endpoint:
> 
> PATCH /jobs/:jobid/checkpoints/configuration
> 
> Key design points:
> 
>   - Timeout changes apply immediately to in-flight checkpoints by
>   rescheduling their canceller timers, saving near-complete checkpoints
>   from being discarded.
>   - Interval changes take effect on the next checkpoint trigger cycle.
>   - Configuration overrides are persisted to ExecutionPlanStore (following
>   the JobResourceRequirements pattern) and automatically restored after
>   failover.
> 
> For more details, please refer to the FLIP [1].
> 
> Looking forward to your feedback and suggestions!
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-571%3A+Support+Dynamically+Updating+Checkpoint+Configuration+at+Runtime+via+REST+API
> 
> Best regards,
> Jiangang Liu

Reply via email to