Re: Spark Streaming over YARN

2015-10-04 Thread nibiau
4 partitions. - Mail original - De: "Dibyendu Bhattacharya" À: "Nicolas Biau" Cc: "Cody Koeninger" , "user" Envoyé: Dimanche 4 Octobre 2015 16:51:38 Objet: Re: Spark Streaming over YARN How many partitions are there in your Kafka topic ? R

Re: Spark Streaming over YARN

2015-10-04 Thread Dibyendu Bhattacharya
ng the kafka flow. > (I use spark 1.3.1) > > Tks > Nicolas > > > - Mail original - > De: "Dibyendu Bhattacharya" > À: nib...@free.fr > Cc: "Cody Koeninger" , "user" > Envoyé: Vendredi 2 Octobre 2015 18:21:35 > Objet: Re: Spa

Re: Spark Streaming over YARN

2015-10-04 Thread nibiau
free.fr Cc: "Cody Koeninger" , "user" Envoyé: Vendredi 2 Octobre 2015 18:21:35 Objet: Re: Spark Streaming over YARN If your Kafka topic has 4 partitions , and if you specify 4 Receivers, messages from each partitions are received by a dedicated receiver. so your receiving paralle

Re: Spark Streaming over YARN

2015-10-02 Thread Cody Koeninger
" > À: "Nicolas Biau" > Cc: "user" > Envoyé: Vendredi 2 Octobre 2015 18:29:09 > Objet: Re: Spark Streaming over YARN > > > Neither of those statements are true. > You need more receivers if you want more parallelism. > You don't have to ma

Re: Spark Streaming over YARN

2015-10-02 Thread nibiau
ndredi 2 Octobre 2015 18:29:09 Objet: Re: Spark Streaming over YARN Neither of those statements are true. You need more receivers if you want more parallelism. You don't have to manage offset positioning with the direct stream if you don't want to, as long as you can accept the limitation

Re: Spark Streaming over YARN

2015-10-02 Thread Cody Koeninger
failure) > > > - Mail original - > De: "Cody Koeninger" > À: "Nicolas Biau" > Cc: "user" > Envoyé: Vendredi 2 Octobre 2015 17:43:41 > Objet: Re: Spark Streaming over YARN > > > If you're using the receiver based i

Re: Spark Streaming over YARN

2015-10-02 Thread Dibyendu Bhattacharya
m as I have to manage the offset > positionning (in order to be able to start from the last offset treated > after a spark job failure) > > > - Mail original - > De: "Cody Koeninger" < c...@koeninger.org > > À: "Nicolas Biau" < nib...@free.fr >

Re: Spark Streaming over YARN

2015-10-02 Thread nibiau
ot; , "user" Envoyé: Vendredi 2 Octobre 2015 18:01:59 Objet: Re: Spark Streaming over YARN Hi, If you need to use Receiver based approach , you can try this one : https://github.com/dibbhatt/kafka-spark-consumer This is also part of Spark packages : http://spark-packages.org/package

Re: Spark Streaming over YARN

2015-10-02 Thread Dibyendu Bhattacharya
o manage the offset > positionning (in order to be able to start from the last offset treated > after a spark job failure) > > > - Mail original - > De: "Cody Koeninger" > À: "Nicolas Biau" > Cc: "user" > Envoyé: Vendredi 2 Octobre 2015 17:43:

Re: Spark Streaming over YARN

2015-10-02 Thread nibiau
-- Mail original - De: "Cody Koeninger" À: "Nicolas Biau" Cc: "user" Envoyé: Vendredi 2 Octobre 2015 17:43:41 Objet: Re: Spark Streaming over YARN If you're using the receiver based implementation, and want more parallelism, you have to create multiple st

Re: Spark Streaming over YARN

2015-10-02 Thread Cody Koeninger
If you're using the receiver based implementation, and want more parallelism, you have to create multiple streams and union them together. Or use the direct stream. On Fri, Oct 2, 2015 at 10:40 AM, wrote: > Hello, > I have a job receiving data from kafka (4 partitions) and persisting data > ins

Spark Streaming over YARN

2015-10-02 Thread nibiau
Hello, I have a job receiving data from kafka (4 partitions) and persisting data inside MongoDB. It works fine, but when I deploy it inside YARN cluster (4 nodes with 2 cores) only on node is receiving all the kafka partitions and only one node is processing my RDD treatment (foreach function) H