Thank you, but that doesn't answer my general question. I might need to enrich my records using different datasources (or DB's)
So the general use case I need to support is to have some kind of Function that has init() logic for creating connection to DB, query the DB for each records and enrich my input record with stuff from the DB, and use some kind of close() logic to close the connection. I have implemented this kind of use case using Map/Reduce and I want to know how can I do it with spark Thanks On Fri, Jul 25, 2014 at 6:24 AM, Yanbo Liang <[email protected]> wrote: > You can refer this topic > http://www.mapr.com/developercentral/code/loading-hbase-tables-spark > > > 2014-07-24 22:32 GMT+08:00 Yosi Botzer <[email protected]>: > > In my case I want to reach HBase. For every record with userId I want to >> get some extra information about the user and add it to result record for >> further prcessing >> >> >> On Thu, Jul 24, 2014 at 9:11 AM, Yanbo Liang <[email protected]> >> wrote: >> >>> If you want to connect to DB in program, you can use JdbcRDD ( >>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala >>> ) >>> >>> >>> 2014-07-24 18:32 GMT+08:00 Yosi Botzer <[email protected]>: >>> >>> Hi, >>>> >>>> I am using the Java api of Spark. >>>> >>>> I wanted to know if there is a way to run some code in a manner that is >>>> like the setup() and cleanup() methods of Hadoop Map/Reduce >>>> >>>> The reason I need it is because I want to read something from the DB >>>> according to each record I scan in my Function, and I would like to open >>>> the DB connection only once (and close it only once). >>>> >>>> Thanks >>>> >>> >>> >> >
