[ https://issues.apache.org/jira/browse/FLINK-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-2824: ---------------------------------- Labels: auto-deprioritized-major stale-minor (was: auto-deprioritized-major) I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help the community manage its development. I see this issues has been marked as Minor but is unassigned and neither itself nor its Sub-Tasks have been updated for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is still Minor, please either assign yourself or give an update. Afterwards, please remove the label or in 7 days the issue will be deprioritized. > Iteration feedback partitioning does not work as expected > --------------------------------------------------------- > > Key: FLINK-2824 > URL: https://issues.apache.org/jira/browse/FLINK-2824 > Project: Flink > Issue Type: Bug > Components: API / DataStream > Affects Versions: 0.10.0 > Reporter: Gyula Fora > Priority: Minor > Labels: auto-deprioritized-major, stale-minor > > Iteration feedback partitioning is not handled transparently and can cause > serious issues if the user does not know the specific implementation details > of streaming iterations (which is not a realistic expectation). > Example: > IterativeStream it = ... (parallelism 1) > DataStream mapped = it.map(...) (parallelism 2) > // this does not work as the feedback has parallelism 2 != 1 > // it.closeWith(mapped.partitionByHash(someField)) > // so we need rebalance the data > it.closeWith(mapped.map(NoOpMap).setParallelism(1).partitionByHash(someField)) > This program will execute but the feedback will not be partitioned by hash to > the mapper instances: > The partitioning will be set from the noOpMap to the iteration sink which has > parallelism different from the mapper (1 vs 2) and then the iteration source > forwards the element to the mapper (always to 0). > So the problem is basically that the iteration source/sink pair gets the > parallelism of the input stream (p=1) not the head operator (p = 2) which > leads to incorrect partitioning. > Workaround: > Set input parallelism to the same as the head operator > Suggested solution: > The iteration construction should be reworked to set the parallelism of the > source/sink to the parallelism of the head operator (and validate that all > heads have the same parallelism) -- This message was sent by Atlassian Jira (v8.20.1#820001)