Re: Multiple rebalances are incorrectly ignored in some cases.

2020-04-27 Thread David Morávek
Hello Aljoscha, unfortunately not, I'm not really familiar with the optimizer code and it's really complex to debug :( this method is as far as I got - https://github.com/apache/flink/blob/master/flink-optimizer/src/main/java/org/apache/flink/optimizer/dataproperties/RequestedGlobalProperties.jav

Re: Multiple rebalances are incorrectly ignored in some cases.

2020-04-27 Thread Aljoscha Krettek
On 27.04.20 09:34, David Morávek wrote: When we include `flatMap` in between rebalances -> `.rebalance().flatMap(...).rebalance()`, we need to reshuffle again, because dataset distribution may have changed (eg. you can possibli emit unbouded stream from a single element). Unfortunatelly `flatMap

Multiple rebalances are incorrectly ignored in some cases.

2020-04-27 Thread David Morávek
Hello Flinkers, we have run into unexpected behaviour with chained Reshuffles in Apache Beam's Flink runner (batch). In flink optimizer, when we `.rebalance()` dataset, is output channel is marked as `FORCED_REBALANCED`. When we chain this with another `.rebalance()`, the latter is ignored becaus