Thanks Tathagata! I tried it, and worked out perfectly.

On Thu, Jul 17, 2014 at 6:34 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> Step 1: use rdd.mapPartitions(). Thats equivalent to mapper of the
> MapReduce. You can open connection, get all the data and buffer it, close
> connection, return iterator to the buffer
> Step 2: Make step 1 better, by making it reuse connections. You can use
> singletons / static vars, to lazily initialize and reuse a pool of
> connections. You will have to take care of concurrency, as multiple tasks
> may using the database in parallel in the same worker JVM.
>
> TD
>
>
> On Thu, Jul 17, 2014 at 4:41 PM, Guangle Fan <fanguan...@gmail.com> wrote:
>
>> Hi, All
>>
>> When I run spark streaming, in one of the flatMap stage, I want to access
>> database.
>>
>> Code looks like :
>>
>> stream.flatMap(
>> new FlatMapFunction {
>>     call () {
>>         //access database cluster
>>     }
>>   }
>> )
>>
>> Since I don't want to create database connection every time call() was
>> called, where is the best place do I create the connection and reuse it on
>> per-host basis (Like one database connection per Mapper/Reducer ) ?
>>
>> Regards,
>>
>> Guangle
>>
>>
>

Reply via email to