[jira] [Created] (FLINK-3256) Invalid execution graph cleanup for jobs with colocation groups

Paris Carbone (JIRA) Mon, 18 Jan 2016 10:53:27 -0800

Paris Carbone created FLINK-3256:
------------------------------------

             Summary: Invalid execution graph cleanup for jobs with colocation 
groups
                 Key: FLINK-3256
                 URL: https://issues.apache.org/jira/browse/FLINK-3256
             Project: Flink
          Issue Type: Bug
          Components: Distributed Runtime
            Reporter: Paris Carbone
            Assignee: Paris Carbone
            Priority: Blocker



Currently, upon restarting an execution graph, we clean-up the colocation 
constraints for each group present in an ExecutionJobVertex respectively.

This can lead to invalid reconfiguration upon a restart or any other activity 
that relies on state cleanup of the execution graph. For example, upon 
restarting a DataStream job with iterations the following steps are executed:

1) IterationSource colocation group constraints are reset
2) New IterationSource colocation group constraints are generated
3) IterationSource subtasks are scheduled with current colocation constraints
4) IterationSink colocation group constraints are reset
5) New IterationSink colocation group constraints are generated
6) IterationSink subtasks are scheduled with different colocation constraints, 
thus, not being colocated with sources while also demanding more slots from the 
scheduler.

This can be trivially fixed by reseting colocation groups independently from 
ExecutionJobVertices, thus, updating them once per reconfiguration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-3256) Invalid execution graph cleanup for jobs with colocation groups

Reply via email to