Oh sorry, just to be more clear - writing from the driver program is only
safe if the amount of data you are trying to write is small enough to fit
on memory in the driver program.  I looked at your code, and since you are
just writing a few things each time interval, this seems safe.


-Vida


On Mon, Aug 18, 2014 at 4:25 PM, Vida Ha <v...@databricks.com> wrote:

> Hi John,
>
> It seems like original problem you had was that you were initializing the
> RabbitMQ connection on the driver, but then calling the code to write to
> RabbitMQ on the workers (I'm guessing, but I don't know since I didn't see
> your code).  That's definitely a problem because the connection can't be
> serialized and passed onto the workers.
>
> dstream.foreachRDD(rdd => {
>    // create connection / channel to source
>    rdd.foreach(
>        //  tries write to rabbitMQ from the worker - there is a problem
> since the connection cannot be passed to the workers.
>       element =>  // write using channel
>    )
>
> })
>
>
> You then changed the code to open the connection on the workers, and write
> out the data once per worker.  This worked, but as you saw - you are
> writing out the data multiple times.
>
> dstream.foreachRDD(rdd => {
>     rdd.foreachPartition(iterator => {
>         // Create or get a singleton instance of a connection / channel
>         iter.foreach(element =>  // write using connection / channel
>
>        // This works and writes, but writes once per partition.
>    })
> })
>
>
> If I understand correctly, I believe what you want is to open the
> connection on the driver, and write to RabbitMQ from the driver as well.
>
> dstream.foreachRDD(rdd => {
>    // Any Code you put here is executed on the driver.
>    rdd.foreach(
>       // Any code inside rdd.forEach is executed on the workers.
>    )
>
>    // This code will be executed back on the driver again.
>    //  You can open the connection and write stats here.
> })
>
>
> Does this clear things up?  I hope it's very clear now when code in the
> streaming example executes on the driver vs. the workers.
>
> -Vida
>
>
>
>
>
> On Mon, Aug 18, 2014 at 2:20 PM, jschindler <john.schind...@utexas.edu>
> wrote:
>
>> Well, it looks like I can use the .repartition(1) method to stuff
>> everything
>> in one partition so that gets rid of the duplicate messages I send to
>> RabbitMQ but that seems like a bad idea perhaps.  Wouldn't that hurt
>> scalability?
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Writing-to-RabbitMQ-tp11283p12324.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to