Re: Canceling a failing/restarting job

Ufuk Celebi Fri, 13 Nov 2015 06:24:53 -0800

Not that I am aware of. This is most probably a bug.

Looking at the code of the ExecutionGraph:


A job can only be cancelled when the job status is CREATED or RUNNING. If the 
job failed during execution it is in state FAILED until it is RESTARTING. After 
resetting the ExecutionGraph state, the state is CREATED (now it’s cancellable) 
until it's scheduled for execution, which then fails it again.

It should work if the cancelling happens right before trying to schedule it. :D

– Ufuk

> On 13 Nov 2015, at 15:07, Gyula Fóra <gyula.f...@gmail.com> wrote:
> 
> Hey,
> 
> Is there any other way to cancel a job besides ./bin/flink cancel jobId?
> This doesnt seem to work when a job cannot be scheduled and is retrying
> over and over again.
> 
> The exception I get:
> 
> 13:58:11,240 INFO  org.apache.flink.runtime.jobmanager.JobManager
>          - Status of job 0c895d22c632de5dfe16c42a9ba818d5 (player-id)
> changed to RESTARTING.
> 13:58:25,234 INFO  org.apache.flink.runtime.jobmanager.JobManager
>          - Trying to cancel job with ID
> 0c895d22c632de5dfe16c42a9ba818d5.
> 13:58:25,561 WARN  akka.remote.ReliableDeliverySupervisor
>          - Association with remote system
> [akka.tcp://flink@127.0.0.1:42012] has failed, address is now gated
> for [5000] ms. Reason is: [Disassociated].
> 
> 
> I will open a JIRA for this, in the meantime it would still be good to
> kill it somehow.
> 
> 
> Cheers,
> 
> Gyula

Re: Canceling a failing/restarting job

Reply via email to