Hi all! Among other use cases, I want to use spark as a distributed sql engine via thrift server. I have some tables in postegres and Cassandra: I need to expose them via hive for custom reporting. Basic implementation is simple and works, but I have some concerns and open question: - is there a better approach rather than mapping a temp table as a select of the full table? - What about query setup cost? I mean, is there a way to avoid db connection setup costs using a pre-created connection pool? - is it possibile from hiveql to use functions defined in the pg database or should I have to rewrite them as udaf?
Thanks! -- Ing. Marco Colombo