Hi, Imagine I have one class having 4 fields: ID, A, B, C. There are three data sources providing data in the form of (ID, A), (ID, B), (ID, C) respectively. I want to join these three data sources to get final (ID, A, B, C) without any window. For example, (ID, A) could come one month after (ID, B). Such joining needs global states. There are two designs in my mind.
1. Stream connect with separated kafka topic streamA_B = DataSourceA connect DataSourceB streamA_B_C = streamA_B connect DataSourceC Each data source is ingested via a dedicated kafka topic. This design seems not scalable because I need N stream connect operations for N+1 data sources. Each stream connect needs to maintain a global state. For example, streamA_B needs a global state for maintaining (ID, A, B) and streamA_B_C needs another for maintaining (ID, A, B, C). 2. Shared kafka topic All data sources are ingested via a shared kafka topic (using union event type or schema reference). Then one Flink job can handle all events from these data sources by maintaining one global state. This design seems more scalable than solution 1. Which one is recommended? Is there a better way that is missed? Appreciate very much for any hints!