I would recommend to try out Kafka's Streams API instead of Spark Streaming.
http://docs.confluent.io/current/streams/index.html -Matthias On 3/20/17 11:32 AM, Ali Akhtar wrote: > Are you saying, that it should process all messages from topic 1, then > topic 2, then topic 3, then 4? > > Or that they need to be processed exactly at the same time? > > On Mon, Mar 20, 2017 at 10:05 PM, Manasa Danda <manasada...@gmail.com> > wrote: > >> Hi, >> >> I am Manasa, currently working on a project that requires processing data >> from multiple topics at the same time. I am looking for an advise on how to >> approach this problem. Below is the use case. >> >> >> We have 4 topics, with data coming in at a different rate in each topic, >> but the messages in each topic share a common unique identifier ( >> attributionId). I need to process all the events in the 4 topics with same >> attributionId at the same time. we are currently using spark streaming for >> processing. >> >> Here's the steps for current logic. >> >> 1. Read and filter data in topic 1 >> 2. Read and filter data in topic 2 >> 3. Read and filter data in topic 3 >> 4. Read and filter data in topic 4 >> 5. Union of DStreams from steps 1-4, which were executed in parallel >> 6. process unified DStream >> >> However, since the data is coming at a different rate, the associated data >> ( topic 1 is generating 1000 times more than topic 2), is not coming in >> same batch window. >> >> Any ideas on how it can implemented would help. >> >> Thank you!! >> >> -Manasa >> >
signature.asc
Description: OpenPGP digital signature