Hi Olle, what you are describing is indeed a problem in Flink. The solution to the problem would be to synchronize the event time across sources so that a source can throttle down when it realizes that it has advanced too far [1]. At the moment, this feature is in development, but not yet finished. I think the best solution right now is what you've actually done: Increase parallelism in order to spread the state load.
[1] https://issues.apache.org/jira/browse/FLINK-10886 Cheers, Till On Thu, Feb 14, 2019 at 5:37 PM Olle Noren <olle.no...@niradynamics.se> wrote: > Hi, > > > > We have a Flink job were we are trying to window join two datastreams > originating from two different Kafka topics, where one topic contains a lot > more data per time instance than the other one. > > We use event time processing, and this all works fine when running our > pipeline live, i.e. data is consumed and processed as soon as it is > ingested in Kafka. > > > > The problem though occurs in the scenario when we are replaying with data > stored in Kafka, then the watermarks of the “larger-stream” are lagging > behind the “smaller-stream” since this stream has less data per time unit > and then is advancing faster. > > This leads to a large state at the join operation since data from the > “smaller-stream” needs to be kept until the corresponding watermarks from > the “larger-stream” have passed. > > To avoid a very large state at the join operator, we have tried to > increase the parallelism for the consumer of the “larger-stream” to make > this keep up with the “smaller stream”, this decreases the size of the > state to some extent. This seems though like a ugly way to get around the > problem and will not work if the sizes of the two Kafka topics are changing > over time. > > > > Is there any way we can synchronize the reading of the Kafka sources based > on the watermarks we have in the two streams, i.e. to pause the reading of > the “smaller-topic” until the “larger-stream” has caught up? Any other > ideas how to handle this replay-scenario? > > > > Thanks in advance > > > > Olle > > > *Olle Noren* > *Systems Engineer* > Fleet Perception for Maintenance > *NIRA Dynamics AB* > Wallenbergs gata 4 > 58330 Linköping > Sweden Mobile: +46 709 748 304 > olle.no...@niradynamics.se > www.niradynamics.se > Together for smarter safety >