Hi

I have to design a spark streaming application with below use case. I am
looking for best possible approach for this.

I have application which pushing data into 1000+ different topics each has
different purpose . Spark streaming will receive data from each topic and
after processing it will write back to corresponding another topic.

Ex.

Input Type 1 Topic  --> Spark Streaming --> Output Type 1 Topic
Input Type 2 Topic  --> Spark Streaming --> Output Type 2 Topic
Input Type 3 Topic  --> Spark Streaming --> Output Type 3 Topic
.
.
.
Input Type N Topic  --> Spark Streaming --> Output Type N Topic  and so on.

I need to answer following questions.

1. Is it a good idea to launch 1000+ spark streaming application per topic
basis ? Or I should have one streaming application for all topics as
processing logic going to be same ?
2. If one streaming context , then how will I determine which RDD belongs
to which Kafka topic , so that after processing I can write it back to its
corresponding OUTPUT Topic?
3. Client may add/delete topic from Kafka , how do dynamically handle in
Spark streaming ?
4. How do I restart job automatically on failure ?

Any other issue you guys see here ?

Highly appreicate your response.

Thanks
Shashi

Reply via email to