Hi everybody!

I'm so excited to be here asking a first question about Flink DataStream
API.

I have a basic enrichment pipeline (event time). Basically, there's a main
stream A (Kafka source) being enriched with the info of 2 other streams: B
and C. (Kafka sources as well).
Basically, the enrichment graph consists in 2 stages:

1. The stream A is enriched with stream B, resulting in stream (A, B)
2. The stream (A, B) is enriched with C, resulting in a stream (A, B, C).


I've created a side output for late events A. The flow for these events
would be slightly different:

1. The stream A is enriched by fetching the info from an external service
using Flin Async I/O, resulting in stream (A, B)
2. Then, the stream (A, B) is enriched with C, resulting in a stream (A, B,
C) (same before)


Note that late A events can arrive out of order (weeks in same cases)


I was wondering where the late-events flow should be defined. One graph/job
for both flows or 2 graph/jobs with different times? What's the general
pattern for cases like this?



Many thanks, Jose Velasco

Reply via email to