Restore from a savepoint is very slow

Dongwon Kim Sun, 01 Apr 2018 22:31:23 -0700

Hi,

While restoring from the latest checkpoint starts immediately after the job is 
restarted, restoring from a savepoint takes more than five minutes until the 
job makes progress.
During the blackout, I cannot observe any resource usage over the cluster.
After that period of time, I observe that Flink tries to catch up with the 
progress in the source topic via various metrics including 
flink_taskmanager_job_task_currentLowWatermark.


FYI, I'm using
- Flink-1.4.2
- FsStateBackend configured with HDFS
- EventTime with BoundedOutOfOrdernessTimestampExtractor

The size of an instance of checkpoint/savepoint is ~50GB and we have 7 servers 
for taskmanagers.

Best,

- Dongwon

Restore from a savepoint is very slow

Reply via email to