I would recommend to try out Kafka's Streams API instead of Spark Streaming.

http://docs.confluent.io/current/streams/index.html

-Matthias


On 3/20/17 11:32 AM, Ali Akhtar wrote:
> Are you saying, that it should process all messages from topic 1, then
> topic 2, then topic 3, then 4?
> 
> Or that they need to be processed exactly at the same time?
> 
> On Mon, Mar 20, 2017 at 10:05 PM, Manasa Danda <manasada...@gmail.com>
> wrote:
> 
>> Hi,
>>
>> I am Manasa, currently working on a project that requires processing data
>> from multiple topics at the same time. I am looking for an advise on how to
>> approach this problem. Below is the use case.
>>
>>
>> We have 4 topics, with data coming in at a different rate in each topic,
>> but the messages in each topic share a common unique identifier (
>> attributionId). I need to process all the events in the 4 topics with same
>> attributionId at the same time. we are currently using spark streaming for
>> processing.
>>
>> Here's the steps for current logic.
>>
>> 1. Read and filter data in topic 1
>> 2. Read and filter data in topic 2
>> 3. Read and filter data in topic 3
>> 4. Read and filter data in topic 4
>> 5. Union of DStreams from steps 1-4, which were executed in parallel
>> 6. process unified DStream
>>
>> However, since the data is coming at a different rate, the associated data
>> ( topic 1 is generating 1000 times more than topic 2), is not coming in
>> same batch window.
>>
>> Any ideas on how it can implemented would help.
>>
>> Thank you!!
>>
>> -Manasa
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to