Thanks for sharing the information. I also observed the same, S3 (Primary Checkpoint Storage) + EBS (Task Local Recovery) performs better than EBS as Primary Checkpoint storage.
On Tue, Jul 18, 2023 at 12:21 PM Konstantin Knauf <kna...@apache.org> wrote: > Hi Prabhu, > > this should be possible, but is quite expensive in comparison to AWS S3 > and you have to remount the EBS volumes to the new Taskmanagers in case of > a failure which takes some non-trivial time, which slows down recovery. So, > overall I don't think its peferrable compared to S3. > > We do use EBS volumes, though, for the local RocksDB working directory. We > don't remount them on failure though right now due to the additional > latency that is introduced by that. > > Cheers, > > Konstantin > > Am Mi., 12. Juli 2023 um 18:55 Uhr schrieb Prabhu Joseph < > prabhujose.ga...@gmail.com>: > >> Hi, >> >> We are investigating the feasibility of setting up an Elastic Block Store >> (EBS) as checkpoint storage by mounting a volume (a shared local file >> system path) to JobManager and all the TaskManager pods. I want to hear any >> feedback on this approach if anyone has already tried it. >> >> >> Thanks, >> Prabhu Joseph >> > > > -- > https://twitter.com/snntrable > https://github.com/knaufk >