Re: Some questions on Multiple Streams

Dibyendu Bhattacharya Fri, 24 Apr 2015 02:20:48 -0700

You can probably try the Low Level Consumer from spark-packages (
http://spark-packages.org/package/dibbhatt/kafka-spark-consumer) .

How many partitions are there for your topics ? Let say you have 10 topics
, and each having 3 partition , ideally you can create max 30 parallel
Receiver and 30 streams. What I understand from your requirement is , for
any given topic you want to choose the number of Receivers . e.g. for Topic
A , you may choose 1 Receiver , for Topic B you choose 2 , for Topic C you
choose 3 etc ..

Now if you can distribute the topics to Receiver like this , you can very
well use the above consumer which has this facility . Each Receiver task
takes one executor core , so you can calculate accordingly.

The implementation has a code example and read-me file , if you wish to try
this , you can always email me .

Regards,
Dibyendu

On Fri, Apr 17, 2015 at 3:06 PM, Laeeq Ahmed <laeeqsp...@yahoo.com.invalid>
wrote:

> Hi,
>
> I am working with multiple Kafka streams (23 streams) and currently I am
> processing them separately. I receive one stream from each topic. I have
> the following questions.
>
> 1.    Spark streaming guide suggests to union these streams. *Is it
> possible to get statistics of each stream even after they are unioned?*
>
> 2.    My calculations are not complex. I use 2 second batch interval and
> if I use 2 streams they get easily processed under 2 seconds by a single
> core. There is some shuffling involved in my application. As I increase the
> number of streams and the number of executors accordingly, the applications
> scheduling delay increases and become unmanageable in 2 seconds. As I
> believe this happens because with that many streams, the number of tasks
> increases thus the shuffling magnifies and also that all streams using the
> same executors. *Is it possible to provide part of executors to
> particular stream while processing streams simultaneously?* E.g. if I
> have 15 cores on cluster and 5 streams, 5  cores will be taken by 5
> receivers and of the rest 10, can I provide 2 cores each to one of the 5
> streams. Just to add, increasing the batch interval does help but I don't
> want to increase the batch size due to application restrictions and delayed
> results (The blockInterval and defaultParallelism does help to a limited
> extent).
>
> *Please see attach file for CODE SNIPPET*
>
> Regards,
> Laeeq
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

Re: Some questions on Multiple Streams

Reply via email to