Hi, If you need to use Receiver based approach , you can try this one : https://github.com/dibbhatt/kafka-spark-consumer
This is also part of Spark packages : http://spark-packages.org/package/dibbhatt/kafka-spark-consumer You just need to specify the number of Receivers you want for desired parallelism while receiving , and rest of the thing will be taken care by ReceiverLauncher. This Low level Receiver will give better parallelism both on receiving , and on processing the RDD. Default Receiver based API ( KafkaUtils.createStream) using Kafka High level API and Kafka high Level API has serious issue to be used in production . Regards, Dibyendu On Fri, Oct 2, 2015 at 9:22 PM, <nib...@free.fr> wrote: > From my understanding as soon as I use YARN I don't need to use > parrallelisme (at least for RDD treatment) > I don't want to use direct stream as I have to manage the offset > positionning (in order to be able to start from the last offset treated > after a spark job failure) > > > ----- Mail original ----- > De: "Cody Koeninger" <c...@koeninger.org> > À: "Nicolas Biau" <nib...@free.fr> > Cc: "user" <user@spark.apache.org> > Envoyé: Vendredi 2 Octobre 2015 17:43:41 > Objet: Re: Spark Streaming over YARN > > > If you're using the receiver based implementation, and want more > parallelism, you have to create multiple streams and union them together. > > > Or use the direct stream. > > > On Fri, Oct 2, 2015 at 10:40 AM, < nib...@free.fr > wrote: > > > Hello, > I have a job receiving data from kafka (4 partitions) and persisting data > inside MongoDB. > It works fine, but when I deploy it inside YARN cluster (4 nodes with 2 > cores) only on node is receiving all the kafka partitions and only one node > is processing my RDD treatment (foreach function) > How can I force YARN to use all the resources nodes and cores to process > the data (receiver & RDD treatment) > > Tks a lot > Nicolas > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >