Re: Late arriving events

Jamie Grier Tue, 05 Jul 2016 19:02:04 -0700

I put a few comments in-line below...

On Tue, Jul 5, 2016 at 4:06 PM, Chen Qin <qinnc...@gmail.com> wrote:


> Hi there,
>
> I understand Flink currently doesn't support handling late arriving
> events. In reality, a exact-once link job needs to deal data missing or
> backfill from time to time without rewind to previous save point, which
> implies restored job suffers blackout while it tried to catch up.
>

Another way to do this is to kick off a parallel job to do the backfill
from the previous savepoint without stopping the current "realtime" job.
This way you would not have to have a "blackout".  This assumes your final
sink can tolerate having some parallel writes to it OR you have two
different sinks and throw a switch from one to another for downstream jobs,
etc.


>
> In general, I don't know if there are good solutions for all of these
> scenarios. Some might keep messages in windows longer.(messages not purged
> yet) Some might kick off another pipeline just dealing with affected
> windows(messages already purged). What would be suggested patterns?
>

Of course, ideally you would just keep data in windows longer such that you
don't purge window state until you're sure there is no more data coming.
The problem with this approach in the real world is that you may be wrong
with whatever time you choose ;)  I would suggest doing the best job
possible upfront by using an appropriate watermark strategy to deal with
most of the data.  Then process the truly late data with a separate path in
the application code.  This "separate" path may have to deal with merging
late data with the data that's already been written to the sink but this is
definitely possible depending on the sink.


> For keeping message longer approach, we need to corrdinate backfill
> messages and current on going messages assigned to windows not firing
> without all of these messages. One challenge of this approach would be
> determine when backfill messages all processed. Ideally there would be a
> customized barrier that travel through entire topology and tell windows
> backfills are done. This works both for non keyed stream and keyed stream.
> I don't think link support this yet. Another way would be use session
> window merge and extent window purging time with some reasonable
> estimation. This approach is based on estimation and may add execution
> latency to those windows.
>
> Which would be suggested way in general?
>
> Thanks,
> Chen
>
>
>
>


-- 

Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com

Re: Late arriving events

Reply via email to