Hi I have to design a spark streaming application with below use case. I am looking for best possible approach for this.
I have application which pushing data into 1000+ different topics each has different purpose . Spark streaming will receive data from each topic and after processing it will write back to corresponding another topic. Ex. Input Type 1 Topic --> Spark Streaming --> Output Type 1 Topic Input Type 2 Topic --> Spark Streaming --> Output Type 2 Topic Input Type 3 Topic --> Spark Streaming --> Output Type 3 Topic . . . Input Type N Topic --> Spark Streaming --> Output Type N Topic and so on. I need to answer following questions. 1. Is it a good idea to launch 1000+ spark streaming application per topic basis ? Or I should have one streaming application for all topics as processing logic going to be same ? 2. If one streaming context , then how will I determine which RDD belongs to which Kafka topic , so that after processing I can write it back to its corresponding OUTPUT Topic? 3. Client may add/delete topic from Kafka , how do dynamically handle in Spark streaming ? 4. How do I restart job automatically on failure ? Any other issue you guys see here ? Highly appreicate your response. Thanks Shashi