I am using s3 as checkpoint storage for Flink running as part of EMR (EC2) + YARN setup and also running on EKS. There should not be any problem with it.
Thanks Sachin On Thu, Apr 24, 2025 at 12:09 PM Anuj Jain <anuj...@gmail.com> wrote: > Dear Apache Flink Community, > > > > I hope this message finds you well. We are currently exploring the option > of utilizing Amazon S3 as a checkpoint storage solution alongside our > Apache Flink server. As part of this effort, we understand that AWS S3 > access must be configured properly, and checkpoints need to be externalized > and retained on S3. > > > > In the Apache Flink documentation, I found the following information about > externalized checkpoints: > > "Externalized Checkpoints – Normally, checkpoints are not intended to be > manipulated by users. Flink retains only the n-most-recent checkpoints (n > being configurable) while a job is running and deletes them when a job is > cancelled. However, you can configure them to be retained, allowing manual > resumption from them." > > > > Given this context, I would appreciate the community's input on the > following query: > > Is it officially supported in a production setup to use the Flink run API > to resume a job from an externalized checkpoint stored on AWS S3? > Specifically, is it possible to invoke the REST API endpoint > "/jars/:jarid/run" and provide 'savepointPath' as a checkpoint directory on > S3 storage, such as "s3://<path_to_checkpoint>"? > > Your insights and experiences would be invaluable as we evaluate this > approach. Thank you for your assistance and support. > > >