State Recovery when job fails and auto-recovers

Sameer Wadkar Wed, 17 Oct 2018 16:20:02 -0700

Hi,

We have a job which is using ValueState. We have turned off checkpoints. The 
state is backed by rocksdb which is backed by S3.


 If the job fails for any exception (ex. Partitions not available or an 
occasional S3 404 error) and auto-recovers, is the entire state lost or does it 
continue from the last saved state. We see that the job has the same 
identifier. We don’t mind losing data during the small interval when the job is 
recovering. But because we are using ValueState as a custom global window to 
accumulate state for a key over a 3 hour window we don’t want to lose all of 
it. 

Checkpointing is not an option because it takes longer per checkpoint and the 
state is huge. 

Thanks,
Sameer

Sent from my iPhone

State Recovery when job fails and auto-recovers

Reply via email to