[ https://issues.apache.org/jira/browse/FLINK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104103#comment-17104103 ]
josson paul kalapparambath edited comment on FLINK-17560 at 5/11/20, 6:00 AM: ------------------------------------------------------------------------------ [~xintongsong] Job Manager is completely restarted as part of some upgrade process. [https://github.com/apache/flink/blob/d54807ba10d0392a60663f030f9fe0bfa1c66754/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/slot/TaskSlotTable.java#L320] If taskSlot.markFree() does not return true, at what point the taskSlot is marked as free?. I am not able to find the code where the slot is assigned as FREE. was (Author: josson): [~xintongsong] Job Manager is completely restarted either as part of some upgrade process. [https://github.com/apache/flink/blob/d54807ba10d0392a60663f030f9fe0bfa1c66754/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/slot/TaskSlotTable.java#L320] If taskSlot.markFree() does not return true, at what point the taskSlot is marked as free?. I am not able to find the code where the slot is assigned as FREE. > No Slots available exception in Apache Flink Job Manager while Scheduling > ------------------------------------------------------------------------- > > Key: FLINK-17560 > URL: https://issues.apache.org/jira/browse/FLINK-17560 > Project: Flink > Issue Type: Bug > Affects Versions: 1.8.3 > Environment: Flink verson 1.8.3 > Session cluster > Reporter: josson paul kalapparambath > Priority: Major > > Set up > ------ > Flink verson 1.8.3 > Zookeeper HA cluster > 1 ResourceManager/Dispatcher (Same Node) > 1 TaskManager > 4 pipelines running with various parallelism's > Issue > ------ > Occationally when the Job Manager gets restarted we noticed that all the > pipelines are not getting scheduled. The error that is reporeted by the Job > Manger is 'not enough slots are available'. This should not be the case > because task manager was deployed with sufficient slots for the number of > pipelines/parallelism we have. > We further noticed that the slot report sent by the taskmanger contains solts > filled with old CANCELLED job Ids. I am not sure why the task manager still > holds the details of the old jobs. Thread dump on the task manager confirms > that old pipelines are not running. > I am aware of https://issues.apache.org/jira/browse/FLINK-12865. But this is > not the issue happening in this case. -- This message was sent by Atlassian Jira (v8.3.4#803005)