Dominik Wosiński created FLINK-9575:
---------------------------------------

             Summary: Potential race condition when removing JobGraph in HA
                 Key: FLINK-9575
                 URL: https://issues.apache.org/jira/browse/FLINK-9575
             Project: Flink
          Issue Type: Bug
            Reporter: Dominik Wosiński


When we are removing the _JobGraph_ from _JobManager_ for example after 
invoking _cancel()_, the following code is executed : 
{noformat}
 
val futureOption = currentJobs.get(jobID) match { case Some((eg, _)) => val 
result = if (removeJobFromStateBackend) { val futureOption = Some(future { try 
{ // ...otherwise, we can have lingering resources when there is a concurrent 
shutdown // and the ZooKeeper client is closed. Not removing the job 
immediately allow the // shutdown to release all resources. 
submittedJobGraphs.removeJobGraph(jobID) } catch { case t: Throwable => 
log.warn(s"Could not remove submitted job graph $jobID.", t) } 
}(context.dispatcher)) try { archive ! decorateMessage( ArchiveExecutionGraph( 
jobID, ArchivedExecutionGraph.createFrom(eg))) } catch { case t: Throwable => 
log.warn(s"Could not archive the execution graph $eg.", t) } futureOption } 
else { None } currentJobs.remove(jobID) result case None => None } // remove 
all job-related BLOBs from local and HA store 
libraryCacheManager.unregisterJob(jobID) blobServer.cleanupJob(jobID, 
removeJobFromStateBackend) jobManagerMetricGroup.removeJob(jobID) futureOption }
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to