In order to use existing pg UDF, you may create a view in pg and expose the view to hive. Spark to database connection happens from each executors, so you must have a connection or a pool of connection per worker. Executors of the same worker can share connection pool.
Best Ayan On 25 Jul 2016 16:48, "Marco Colombo" <ing.marco.colo...@gmail.com> wrote: > Hi all! > Among other use cases, I want to use spark as a distributed sql engine > via thrift server. > I have some tables in postegres and Cassandra: I need to expose them via > hive for custom reporting. > Basic implementation is simple and works, but I have some concerns and > open question: > - is there a better approach rather than mapping a temp table as a select > of the full table? > - What about query setup cost? I mean, is there a way to avoid db > connection setup costs using a pre-created connection pool? > - is it possibile from hiveql to use functions defined in the pg database > or should I have to rewrite them as udaf? > > Thanks! > > > > -- > Ing. Marco Colombo >