Hi,

I am Manasa, currently working on a project that requires processing data
from multiple topics at the same time. I am looking for an advise on how to
approach this problem. Below is the use case.


We have 4 topics, with data coming in at a different rate in each topic,
but the messages in each topic share a common unique identifier (
attributionId). I need to process all the events in the 4 topics with same
attributionId at the same time. we are currently using spark streaming for
processing.

Here's the steps for current logic.

1. Read and filter data in topic 1
2. Read and filter data in topic 2
3. Read and filter data in topic 3
4. Read and filter data in topic 4
5. Union of DStreams from steps 1-4, which were executed in parallel
6. process unified DStream

However, since the data is coming at a different rate, the associated data
( topic 1 is generating 1000 times more than topic 2), is not coming in
same batch window.

Any ideas on how it can implemented would help.

Thank you!!

-Manasa

Reply via email to