Hi, I am Manasa, currently working on a project that requires processing data from multiple topics at the same time. I am looking for an advise on how to approach this problem. Below is the use case.
We have 4 topics, with data coming in at a different rate in each topic, but the messages in each topic share a common unique identifier ( attributionId). I need to process all the events in the 4 topics with same attributionId at the same time. we are currently using spark streaming for processing. Here's the steps for current logic. 1. Read and filter data in topic 1 2. Read and filter data in topic 2 3. Read and filter data in topic 3 4. Read and filter data in topic 4 5. Union of DStreams from steps 1-4, which were executed in parallel 6. process unified DStream However, since the data is coming at a different rate, the associated data ( topic 1 is generating 1000 times more than topic 2), is not coming in same batch window. Any ideas on how it can implemented would help. Thank you!! -Manasa