You can probably try the Low Level Consumer from spark-packages ( http://spark-packages.org/package/dibbhatt/kafka-spark-consumer) .
How many partitions are there for your topics ? Let say you have 10 topics , and each having 3 partition , ideally you can create max 30 parallel Receiver and 30 streams. What I understand from your requirement is , for any given topic you want to choose the number of Receivers . e.g. for Topic A , you may choose 1 Receiver , for Topic B you choose 2 , for Topic C you choose 3 etc .. Now if you can distribute the topics to Receiver like this , you can very well use the above consumer which has this facility . Each Receiver task takes one executor core , so you can calculate accordingly. The implementation has a code example and read-me file , if you wish to try this , you can always email me . Regards, Dibyendu On Fri, Apr 17, 2015 at 3:06 PM, Laeeq Ahmed <laeeqsp...@yahoo.com.invalid> wrote: > Hi, > > I am working with multiple Kafka streams (23 streams) and currently I am > processing them separately. I receive one stream from each topic. I have > the following questions. > > 1. Spark streaming guide suggests to union these streams. *Is it > possible to get statistics of each stream even after they are unioned?* > > 2. My calculations are not complex. I use 2 second batch interval and > if I use 2 streams they get easily processed under 2 seconds by a single > core. There is some shuffling involved in my application. As I increase the > number of streams and the number of executors accordingly, the applications > scheduling delay increases and become unmanageable in 2 seconds. As I > believe this happens because with that many streams, the number of tasks > increases thus the shuffling magnifies and also that all streams using the > same executors. *Is it possible to provide part of executors to > particular stream while processing streams simultaneously?* E.g. if I > have 15 cores on cluster and 5 streams, 5 cores will be taken by 5 > receivers and of the rest 10, can I provide 2 cores each to one of the 5 > streams. Just to add, increasing the batch interval does help but I don't > want to increase the batch size due to application restrictions and delayed > results (The blockInterval and defaultParallelism does help to a limited > extent). > > *Please see attach file for CODE SNIPPET* > > Regards, > Laeeq > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >