In order to use existing pg UDF, you may create a view in pg and expose the
view to hive.
Spark to database connection happens from each executors, so you must have
a connection or a pool of connection per worker. Executors of the same
worker can share connection pool.

Best
Ayan
On 25 Jul 2016 16:48, "Marco Colombo" <ing.marco.colo...@gmail.com> wrote:

> Hi all!
> Among other use cases, I want to use spark as a distributed sql engine
> via thrift server.
> I have some tables in postegres and Cassandra: I need to expose them via
> hive for custom reporting.
> Basic implementation is simple and works, but I have some concerns and
> open question:
> - is there a better approach rather than mapping a temp table as a select
> of the full table?
> - What about query setup cost? I mean, is there a way to avoid db
> connection setup costs using a pre-created connection pool?
> - is it possibile from hiveql to use functions defined in the pg database
> or should I have to rewrite them as udaf?
>
> Thanks!
>
>
>
> --
> Ing. Marco Colombo
>

Reply via email to