subject:"Cluster resurrects old job"

Re: Cluster resurrects old job

2018-06-20 Thread Elias Levy

Alas, that error appears to be a red herring. Admin mistyped the cancel command leading to the error. But immediately corrected it, resulting in the job being canceled next. So seems unrelated to the job coming back to life later on. On Wed, Jun 20, 2018 at 10:04 AM Elias Levy wrote: > The so

Re: Cluster resurrects old job

2018-06-20 Thread Elias Levy

The source of the issue may be this error that occurred when the job was being canceled on June 5: June 5th 2018, 14:59:59.430 Failure during cancellation of job c59dd3133b1182ce2c05a5e2603a0646 with savepoint. java.io.IOException: Failed to create savepoint directory at --checkpoint-dir at org.ap

Cluster resurrects old job

2018-06-20 Thread Elias Levy

We had an unusual situation last night. One of our Flink clusters experienced some connectivity issues, with lead to the the single job running on the cluster failing and then being restored. And then something odd happened. The cluster decided to also restore an old version of the job. One we