[ https://issues.apache.org/jira/browse/FLINK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402303#comment-15402303 ]
Till Rohrmann commented on FLINK-4296: -------------------------------------- I found the issue. The problem is that a scheduling failure of a consumer task in {{Execution.scheduleOrUpdateConsumer}} fails the current {{Execution}} and not the consuming {{Execution}}. If the current {{Execution}} is already in a final state (e.g. blocking data exchange), then the failure is simply ignored. I propose to fail the consuming {{Execution}} instead of the producing {{Execution}} in order to fail the job. > Scheduler accepts more tasks than it has task slots available > ------------------------------------------------------------- > > Key: FLINK-4296 > URL: https://issues.apache.org/jira/browse/FLINK-4296 > Project: Flink > Issue Type: Bug > Components: JobManager, TaskManager > Affects Versions: 1.1.0 > Reporter: Maximilian Michels > Assignee: Till Rohrmann > Priority: Critical > Fix For: 1.1.0, 1.2.0 > > > Flink's scheduler doesn't support queued scheduling but expects to find all > necessary task slots upon scheduling. If it does not it throws an error. Due > to some changes in the latest master, this seems to be broken. > Flink accepts jobs with {{parallelism > total number of task slots}}, > schedules and deploys tasks in all available task slots, and leaves the > remaining tasks lingering forever. > Easy to reproduce: > {code} > ./bin/flink run -p TASK_SLOTS+n > {code} > where {{TASK_SLOTS}} is the number of total task slots of the cluster and > {{n>=1}}. > Here, {{p=11}}, {{TASK_SLOTS=10}}: > {{bin/flink run -p 11 examples/batch/EnumTriangles.jar}} > {noformat} > Cluster configuration: Standalone cluster with JobManager at > localhost/127.0.0.1:6123 > Using address localhost:6123 to connect to JobManager. > JobManager web interface address http://localhost:8081 > Starting execution of program > Executing EnumTriangles example with default edges data set. > Use --edges to specify file input. > Printing result to stdout. Use --output to specify output path. > Submitting job with JobID: cd0c0b4cbe25643d8d92558168cfc045. Waiting for job > completion. > 08/01/2016 12:12:12 Job execution switched to status RUNNING. > 08/01/2016 12:12:12 CHAIN DataSource (at > getDefaultEdgeDataSet(EnumTrianglesData.java:57) > (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at > main(EnumTriangles.java:108))(1/1) switched to SCHEDULED > 08/01/2016 12:12:12 CHAIN DataSource (at > getDefaultEdgeDataSet(EnumTrianglesData.java:57) > (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at > main(EnumTriangles.java:108))(1/1) switched to DEPLOYING > 08/01/2016 12:12:12 CHAIN DataSource (at > getDefaultEdgeDataSet(EnumTrianglesData.java:57) > (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at > main(EnumTriangles.java:108))(1/1) switched to RUNNING > 08/01/2016 12:12:12 CHAIN DataSource (at > getDefaultEdgeDataSet(EnumTrianglesData.java:57) > (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at > main(EnumTriangles.java:108))(1/1) switched to FINISHED > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(1/11) switched to SCHEDULED > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(3/11) switched to SCHEDULED > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(2/11) switched to SCHEDULED > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(7/11) switched to SCHEDULED > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(7/11) switched to DEPLOYING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(6/11) switched to SCHEDULED > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(4/11) switched to SCHEDULED > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(5/11) switched to SCHEDULED > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(4/11) switched to DEPLOYING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(3/11) switched to DEPLOYING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(9/11) switched to SCHEDULED > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(9/11) switched to DEPLOYING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(5/11) switched to DEPLOYING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(1/11) switched to DEPLOYING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(1/11) > switched to SCHEDULED > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(1/11) > switched to DEPLOYING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(2/11) > switched to SCHEDULED > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(2/11) > switched to DEPLOYING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(3/11) > switched to SCHEDULED > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(3/11) > switched to DEPLOYING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(4/11) > switched to SCHEDULED > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(4/11) > switched to DEPLOYING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(5/11) > switched to SCHEDULED > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(5/11) > switched to DEPLOYING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(6/11) > switched to SCHEDULED > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(6/11) > switched to DEPLOYING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(7/11) > switched to SCHEDULED > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(7/11) > switched to DEPLOYING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(8/11) > switched to SCHEDULED > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(8/11) > switched to DEPLOYING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(9/11) > switched to SCHEDULED > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(9/11) > switched to DEPLOYING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(10/11) > switched to SCHEDULED > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(10/11) > switched to DEPLOYING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(11/11) switched to SCHEDULED > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(10/11) switched to SCHEDULED > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(11/11) switched to DEPLOYING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(10/11) switched to DEPLOYING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(8/11) switched to SCHEDULED > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(6/11) switched to DEPLOYING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(2/11) switched to DEPLOYING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(3/11) switched to RUNNING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(11/11) > switched to SCHEDULED > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(1/11) switched to RUNNING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(1/11) > switched to RUNNING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(2/11) > switched to RUNNING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(3/11) > switched to RUNNING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(9/11) switched to RUNNING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(4/11) switched to RUNNING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(5/11) switched to RUNNING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(7/11) > switched to RUNNING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(6/11) > switched to RUNNING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(8/11) > switched to RUNNING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(9/11) > switched to RUNNING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(10/11) > switched to RUNNING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(10/11) switched to RUNNING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(11/11) switched to RUNNING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(4/11) > switched to RUNNING > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(5/11) > switched to RUNNING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(7/11) switched to RUNNING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(2/11) switched to RUNNING > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(6/11) switched to RUNNING > 08/01/2016 12:12:13 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(1/11) switched to FINISHED > 08/01/2016 12:12:13 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(2/11) switched to FINISHED > 08/01/2016 12:12:13 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(7/11) switched to FINISHED > 08/01/2016 12:12:13 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(6/11) switched to FINISHED > 08/01/2016 12:12:13 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(3/11) switched to FINISHED > 08/01/2016 12:12:13 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(9/11) switched to FINISHED > 08/01/2016 12:12:13 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(11/11) switched to FINISHED > 08/01/2016 12:12:13 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(5/11) switched to FINISHED > 08/01/2016 12:12:13 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(10/11) switched to FINISHED > 08/01/2016 12:12:13 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(4/11) switched to FINISHED > {noformat} > For {{8/11}}, the {{Join}} task switches to RUNNING, but the {{GroupReduce}} > does not: > {noformat} > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(8/11) > switched to SCHEDULED > 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(8/11) > switched to DEPLOYING > .... > 08/01/2016 12:12:12 GroupReduce (GroupReduce at > main(EnumTriangles.java:112))(8/11) switched to SCHEDULED > .... > {08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(8/11) > switched to RUNNING}} > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)