Need more details but you might want to filter the data first ( create multiple RDD) and then process.
> On Oct 5, 2015, at 8:35 PM, Chen Song <chen.song...@gmail.com> wrote: > > We have a use case with the following design in Spark Streaming. > > Within each batch, > * data is read and partitioned by some key > * forEachPartition is used to process the entire partition > * within each partition, there are several REST clients created to connect to > different REST services > * for the list of records within each partition, it will call these services, > each service call is independent of others; records are just pre-partitioned > to make these calls more efficiently. > > I have a question > * Since each call is time taking and to prevent the calls to be executed > sequentially, how can I parallelize the service calls within processing of > each partition? Can I just use Scala future within forEachPartition(or > mapPartitions)? > > Any suggestions greatly appreciated. > > Chen > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org