I want to implement a job which feeds sink messages back into a source, effectively introducing a cycle. For simplicity, consider the following kafka-based enrichment scenario:
- A source for the "facts" - A source for the "dimensions" - A sink for the "dimensions" (this creates a loop) - A co-process operator for joining doing enrichment - Finally, another sink for the final "enriched facts" The thing is that, based on the "acts, the data in the dimensions table might change, and in that case I want to update both the internal state of the join operator (where the table is materialized) and emit the update to the sink (because another service needs that information which in turn will potentially result in new facts coming in). So, to be clear, the feedback loop is not an internal one within the Flink job pipeline, but an external one and hence the graph topology continues to be a DAG. I was just wondering how common this use case is and which precautions are necessary to take if any. In other libraries/frameworks, that might be problematic, e.g.: - https://github.com/lovoo/goka/issues/95 I guess for keeping the local state of the joiner in sync with the i/o kafka topic I would need to enable exactly once guarantees at the sink level, e.g., by following this recipe: - https://docs.immerok.cloud/docs/how-to-guides/development/exactly-once-with-apache-kafka-and-apache-flink/ Thanks in advance! Salva