Need more details but you might want to filter the data first ( create multiple 
RDD) and then process.


> On Oct 5, 2015, at 8:35 PM, Chen Song <chen.song...@gmail.com> wrote:
> 
> We have a use case with the following design in Spark Streaming.
> 
> Within each batch,
> * data is read and partitioned by some key
> * forEachPartition is used to process the entire partition
> * within each partition, there are several REST clients created to connect to 
> different REST services
> * for the list of records within each partition, it will call these services, 
> each service call is independent of others; records are just pre-partitioned 
> to make these calls more efficiently.
> 
> I have a question
> * Since each call is time taking and to prevent the calls to be executed 
> sequentially, how can I parallelize the service calls within processing of 
> each partition? Can I just use Scala future within forEachPartition(or 
> mapPartitions)?
> 
> Any suggestions greatly appreciated.
> 
> Chen
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to