Hi,

If you need to use Receiver based approach , you can try this one :
https://github.com/dibbhatt/kafka-spark-consumer

This is also part of Spark packages :
http://spark-packages.org/package/dibbhatt/kafka-spark-consumer

You just need to specify the number of Receivers you want for desired
parallelism while receiving , and rest of the thing will be taken care by
ReceiverLauncher.

This Low level Receiver  will give better parallelism both on receiving ,
and on processing the RDD.

Default Receiver based API ( KafkaUtils.createStream) using Kafka High
level API and Kafka high Level API has serious issue to be used in
production .


Regards,
Dibyendu





On Fri, Oct 2, 2015 at 9:22 PM, <nib...@free.fr> wrote:

> From my understanding as soon as I use YARN I don't need to use
> parrallelisme (at least for RDD treatment)
> I don't want to use direct stream as I have to manage the offset
> positionning (in order to be able to start from the last offset treated
> after a spark job failure)
>
>
> ----- Mail original -----
> De: "Cody Koeninger" <c...@koeninger.org>
> À: "Nicolas Biau" <nib...@free.fr>
> Cc: "user" <user@spark.apache.org>
> Envoyé: Vendredi 2 Octobre 2015 17:43:41
> Objet: Re: Spark Streaming over YARN
>
>
> If you're using the receiver based implementation, and want more
> parallelism, you have to create multiple streams and union them together.
>
>
> Or use the direct stream.
>
>
> On Fri, Oct 2, 2015 at 10:40 AM, < nib...@free.fr > wrote:
>
>
> Hello,
> I have a job receiving data from kafka (4 partitions) and persisting data
> inside MongoDB.
> It works fine, but when I deploy it inside YARN cluster (4 nodes with 2
> cores) only on node is receiving all the kafka partitions and only one node
> is processing my RDD treatment (foreach function)
> How can I force YARN to use all the resources nodes and cores to process
> the data (receiver & RDD treatment)
>
> Tks a lot
> Nicolas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to