Hi John, It seems like original problem you had was that you were initializing the RabbitMQ connection on the driver, but then calling the code to write to RabbitMQ on the workers (I'm guessing, but I don't know since I didn't see your code). That's definitely a problem because the connection can't be serialized and passed onto the workers.
dstream.foreachRDD(rdd => { // create connection / channel to source rdd.foreach( // tries write to rabbitMQ from the worker - there is a problem since the connection cannot be passed to the workers. element => // write using channel ) }) You then changed the code to open the connection on the workers, and write out the data once per worker. This worked, but as you saw - you are writing out the data multiple times. dstream.foreachRDD(rdd => { rdd.foreachPartition(iterator => { // Create or get a singleton instance of a connection / channel iter.foreach(element => // write using connection / channel // This works and writes, but writes once per partition. }) }) If I understand correctly, I believe what you want is to open the connection on the driver, and write to RabbitMQ from the driver as well. dstream.foreachRDD(rdd => { // Any Code you put here is executed on the driver. rdd.foreach( // Any code inside rdd.forEach is executed on the workers. ) // This code will be executed back on the driver again. // You can open the connection and write stats here. }) Does this clear things up? I hope it's very clear now when code in the streaming example executes on the driver vs. the workers. -Vida On Mon, Aug 18, 2014 at 2:20 PM, jschindler <john.schind...@utexas.edu> wrote: > Well, it looks like I can use the .repartition(1) method to stuff > everything > in one partition so that gets rid of the duplicate messages I send to > RabbitMQ but that seems like a bad idea perhaps. Wouldn't that hurt > scalability? > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Writing-to-RabbitMQ-tp11283p12324.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >