[jira] [Comment Edited] (FLINK-17560) No Slots available exception in Apache Flink Job Manager while Scheduling

josson paul kalapparambath (Jira) Sun, 10 May 2020 23:01:18 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104103#comment-17104103
 ]


josson paul kalapparambath edited comment on FLINK-17560 at 5/11/20, 6:00 AM:
------------------------------------------------------------------------------

[~xintongsong]

   Job Manager is completely restarted as part of some upgrade process.

[https://github.com/apache/flink/blob/d54807ba10d0392a60663f030f9fe0bfa1c66754/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/slot/TaskSlotTable.java#L320]

If taskSlot.markFree() does not return true, at what point the taskSlot is 
marked as free?. I am not able to find the code where the slot is assigned as 
FREE.

 


was (Author: josson):
[~xintongsong]

   Job Manager is completely restarted either as part of some upgrade process.

[https://github.com/apache/flink/blob/d54807ba10d0392a60663f030f9fe0bfa1c66754/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/slot/TaskSlotTable.java#L320]

If taskSlot.markFree() does not return true, at what point the taskSlot is 
marked as free?. I am not able to find the code where the slot is assigned as 
FREE.

 

> No Slots available exception in Apache Flink Job Manager while Scheduling
> -------------------------------------------------------------------------
>
>                 Key: FLINK-17560
>                 URL: https://issues.apache.org/jira/browse/FLINK-17560
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.8.3
>         Environment: Flink verson 1.8.3
> Session cluster
>            Reporter: josson paul kalapparambath
>            Priority: Major
>
> Set up
> ------
> Flink verson 1.8.3
> Zookeeper HA cluster
> 1 ResourceManager/Dispatcher (Same Node)
> 1 TaskManager
> 4 pipelines running with various parallelism's
> Issue
> ------
> Occationally when the Job Manager gets restarted we noticed that all the 
> pipelines are not getting scheduled. The error that is reporeted by the Job 
> Manger is 'not enough slots are available'. This should not be the case 
> because task manager was deployed with sufficient slots for the number of 
> pipelines/parallelism we have.
> We further noticed that the slot report sent by the taskmanger contains solts 
> filled with old CANCELLED job Ids. I am not sure why the task manager still 
> holds the details of the old jobs. Thread dump on the task manager confirms 
> that old pipelines are not running.
> I am aware of https://issues.apache.org/jira/browse/FLINK-12865. But this is 
> not the issue happening in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-17560) No Slots available exception in Apache Flink Job Manager while Scheduling

Reply via email to