Thanks Tathagata! I tried it, and worked out perfectly.
On Thu, Jul 17, 2014 at 6:34 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > Step 1: use rdd.mapPartitions(). Thats equivalent to mapper of the > MapReduce. You can open connection, get all the data and buffer it, close > connection, return iterator to the buffer > Step 2: Make step 1 better, by making it reuse connections. You can use > singletons / static vars, to lazily initialize and reuse a pool of > connections. You will have to take care of concurrency, as multiple tasks > may using the database in parallel in the same worker JVM. > > TD > > > On Thu, Jul 17, 2014 at 4:41 PM, Guangle Fan <fanguan...@gmail.com> wrote: > >> Hi, All >> >> When I run spark streaming, in one of the flatMap stage, I want to access >> database. >> >> Code looks like : >> >> stream.flatMap( >> new FlatMapFunction { >> call () { >> //access database cluster >> } >> } >> ) >> >> Since I don't want to create database connection every time call() was >> called, where is the best place do I create the connection and reuse it on >> per-host basis (Like one database connection per Mapper/Reducer ) ? >> >> Regards, >> >> Guangle >> >> >