Hi,
I was wondering if there's any way of having long running session type 
behaviour in spark. For example, let's say we're using Spark Streaming to 
listen to a stream of events. Upon receiving an event, we process it, and if 
certain conditions are met, we wish to send a message to rabbitmq. Now, rabbit 
clients have the concept of a connection factory, from which you create a 
connection, from which you create a channel. You use the channel to get a 
queue, and finally the queue is what you publish messages on.

Currently, what I'm doing can be summarised as :

dstream.foreachRDD(x => x.forEachPartition(y => {
   val factory = ..
   val connection = ...
   val channel = ...
   val queue = channel.declareQueue(...);
    
   y.foreach(z => Processor.Process(z, queue));
   
   cleanup the queue stuff. 
}));

I'm doing the same thing for using Cassandra, etc. Now in these cases, the 
session initiation is expensive, so foing it per message is not a good idea. 
However, I can't find a way to say "hey...do this per worker once and only 
once".

Is there a better pattern to do this?

Regards,
Ashic.
                                          

Reply via email to