Hi Olle,

what you are describing is indeed a problem in Flink. The solution to the
problem would be to synchronize the event time across sources so that a
source can throttle down when it realizes that it has advanced too far [1].
At the moment, this feature is in development, but not yet finished. I
think the best solution right now is what you've actually done: Increase
parallelism in order to spread the state load.

[1] https://issues.apache.org/jira/browse/FLINK-10886

Cheers,
Till

On Thu, Feb 14, 2019 at 5:37 PM Olle Noren <olle.no...@niradynamics.se>
wrote:

> Hi,
>
>
>
> We have a Flink job were we are trying to window join two datastreams
> originating from two different Kafka topics, where one topic contains a lot
> more data per time instance than the other one.
>
> We use event time processing, and this all works fine when running our
> pipeline live, i.e. data is consumed and processed as soon as it is
> ingested in Kafka.
>
>
>
> The problem though occurs in the scenario when we are replaying with data
> stored in Kafka, then the watermarks of the “larger-stream” are lagging
> behind the “smaller-stream” since this stream has less data per time unit
> and then is advancing faster.
>
> This leads to a large state at the join operation since data from the
> “smaller-stream” needs to be kept until the corresponding watermarks from
> the “larger-stream” have passed.
>
> To avoid a very large state at the join operator, we have tried to
> increase the parallelism for the consumer of the “larger-stream” to make
> this keep up with the “smaller stream”, this decreases the size of the
> state to some extent. This seems though like a ugly way to get around the
> problem and will not work if the sizes of the two Kafka topics are changing
> over time.
>
>
>
> Is there any way we can synchronize the reading of the Kafka sources based
> on the watermarks we have in the two streams, i.e. to pause the reading of
> the “smaller-topic” until the “larger-stream” has caught up? Any other
> ideas how to handle this replay-scenario?
>
>
>
> Thanks in advance
>
>
>
> Olle
>
>
> *Olle Noren*
> *Systems Engineer*
> Fleet Perception for Maintenance
> *NIRA Dynamics AB*
> Wallenbergs gata 4
> 58330 Linköping
> Sweden Mobile: +46 709 748 304
> olle.no...@niradynamics.se
> www.niradynamics.se
> Together for smarter safety
>

Reply via email to