I put a few comments in-line below... On Tue, Jul 5, 2016 at 4:06 PM, Chen Qin <qinnc...@gmail.com> wrote:
> Hi there, > > I understand Flink currently doesn't support handling late arriving > events. In reality, a exact-once link job needs to deal data missing or > backfill from time to time without rewind to previous save point, which > implies restored job suffers blackout while it tried to catch up. > Another way to do this is to kick off a parallel job to do the backfill from the previous savepoint without stopping the current "realtime" job. This way you would not have to have a "blackout". This assumes your final sink can tolerate having some parallel writes to it OR you have two different sinks and throw a switch from one to another for downstream jobs, etc. > > In general, I don't know if there are good solutions for all of these > scenarios. Some might keep messages in windows longer.(messages not purged > yet) Some might kick off another pipeline just dealing with affected > windows(messages already purged). What would be suggested patterns? > Of course, ideally you would just keep data in windows longer such that you don't purge window state until you're sure there is no more data coming. The problem with this approach in the real world is that you may be wrong with whatever time you choose ;) I would suggest doing the best job possible upfront by using an appropriate watermark strategy to deal with most of the data. Then process the truly late data with a separate path in the application code. This "separate" path may have to deal with merging late data with the data that's already been written to the sink but this is definitely possible depending on the sink. > For keeping message longer approach, we need to corrdinate backfill > messages and current on going messages assigned to windows not firing > without all of these messages. One challenge of this approach would be > determine when backfill messages all processed. Ideally there would be a > customized barrier that travel through entire topology and tell windows > backfills are done. This works both for non keyed stream and keyed stream. > I don't think link support this yet. Another way would be use session > window merge and extent window purging time with some reasonable > estimation. This approach is based on estimation and may add execution > latency to those windows. > > Which would be suggested way in general? > > Thanks, > Chen > > > > -- Jamie Grier data Artisans, Director of Applications Engineering @jamiegrier <https://twitter.com/jamiegrier> ja...@data-artisans.com