Hi John,

It seems like original problem you had was that you were initializing the
RabbitMQ connection on the driver, but then calling the code to write to
RabbitMQ on the workers (I'm guessing, but I don't know since I didn't see
your code).  That's definitely a problem because the connection can't be
serialized and passed onto the workers.

dstream.foreachRDD(rdd => {
   // create connection / channel to source
   rdd.foreach(
       //  tries write to rabbitMQ from the worker - there is a problem
since the connection cannot be passed to the workers.
      element =>  // write using channel
   )

})


You then changed the code to open the connection on the workers, and write
out the data once per worker.  This worked, but as you saw - you are
writing out the data multiple times.

dstream.foreachRDD(rdd => {
    rdd.foreachPartition(iterator => {
        // Create or get a singleton instance of a connection / channel
        iter.foreach(element =>  // write using connection / channel

       // This works and writes, but writes once per partition.
   })
})


If I understand correctly, I believe what you want is to open the
connection on the driver, and write to RabbitMQ from the driver as well.

dstream.foreachRDD(rdd => {
   // Any Code you put here is executed on the driver.
   rdd.foreach(
      // Any code inside rdd.forEach is executed on the workers.
   )

   // This code will be executed back on the driver again.
   //  You can open the connection and write stats here.
})


Does this clear things up?  I hope it's very clear now when code in the
streaming example executes on the driver vs. the workers.

-Vida





On Mon, Aug 18, 2014 at 2:20 PM, jschindler <john.schind...@utexas.edu>
wrote:

> Well, it looks like I can use the .repartition(1) method to stuff
> everything
> in one partition so that gets rid of the duplicate messages I send to
> RabbitMQ but that seems like a bad idea perhaps.  Wouldn't that hurt
> scalability?
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Writing-to-RabbitMQ-tp11283p12324.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to