Alas, that error appears to be a red herring. Admin mistyped the cancel
command leading to the error. But immediately corrected it, resulting in
the job being canceled next. So seems unrelated to the job coming back to
life later on.
On Wed, Jun 20, 2018 at 10:04 AM Elias Levy
wrote:
> The so
The source of the issue may be this error that occurred when the job was
being canceled on June 5:
June 5th 2018, 14:59:59.430 Failure during cancellation of job
c59dd3133b1182ce2c05a5e2603a0646 with savepoint.
java.io.IOException: Failed to create savepoint directory at
--checkpoint-dir
at
org.ap
We had an unusual situation last night. One of our Flink clusters
experienced some connectivity issues, with lead to the the single job
running on the cluster failing and then being restored.
And then something odd happened. The cluster decided to also restore an
old version of the job. One we