Stefan,
Can’t thank you enough for this write-up. This is awesome explanation. I had
misunderstood concepts of RocksDB working directory and Checkpoint FS. My main
intent is to boost performance of RocksDB with SSD available locally. Recovery
time from HDFS is not much of a concern but load on
Hi,
ok, let me briefly explain the differences between local working director,
checkpoint directory, and savepoint directory and also outline their best
practises/requirements/tradeoffs. First easy comment is that typically
checkpoints and savepoints have similar requirements and most users wri
Sorry,
Just a follow-up. In absence of NAS then the best option to go with here is
checkpoint and savepoints both on HDFS and StateBackend using local SSDs then?
We were trying to not even hit HDFS other than for savepoints.
- Ashish
On Monday, July 23, 2018, 7:45 AM, ashish pok wrote:
Stefan
Stefan,
I did have first point at the back of my mind. I was under the impression
though for checkpoints, cleanup would be done by TMs as they are being taken by
TMs.
So for a standalone cluster with its own zookeeper for JM high availability, a
NAS is a must have? We were going to go with local
Hi,
I am wondering how this can even work properly if you are using a local fs for
checkpoints instead of a distributed fs. First, what happens under node
failures, if the SSD becomes unavailable or if a task gets scheduled to a
different machine, and can no longer access the disk with the cor
All,
We recently moved our Checkpoint directory from HDFS to local SSDs mounted on
Data Nodes (we were starting to see perf impacts on checkpoints etc as complex
ML apps were spinning up more and more in YARN). This worked great other than
the fact that when jobs are being canceled or canceled