Re: Elastic Block Store as checkpoint storage

2023-07-20 Thread David Morávek
Using EBS as checkpoint storage doesn't work in a distributed environment if you need to move the state between TMs (e.g., for rescaling and non-local recovery). You'd need something along the lines of RW multi-attach and set up the volumes in a smart way; it won't be easy to set up; I'm not aware

Re: Elastic Block Store as checkpoint storage

2023-07-19 Thread Prabhu Joseph
Thanks for sharing the information. I also observed the same, S3 (Primary Checkpoint Storage) + EBS (Task Local Recovery) performs better than EBS as Primary Checkpoint storage. On Tue, Jul 18, 2023 at 12:21 PM Konstantin Knauf wrote: > Hi Prabhu, > > this should be possible, but is quite exp

Re: Elastic Block Store as checkpoint storage

2023-07-17 Thread Konstantin Knauf
Hi Prabhu, this should be possible, but is quite expensive in comparison to AWS S3 and you have to remount the EBS volumes to the new Taskmanagers in case of a failure which takes some non-trivial time, which slows down recovery. So, overall I don't think its peferrable compared to S3. We do use

Elastic Block Store as checkpoint storage

2023-07-12 Thread Prabhu Joseph
Hi, We are investigating the feasibility of setting up an Elastic Block Store (EBS) as checkpoint storage by mounting a volume (a shared local file system path) to JobManager and all the TaskManager pods. I want to hear any feedback on this approach if anyone has already tried it. Thanks, Prabhu