Spark SQL - Complex query pushdown

Sathish Kumaran Vairavelu Sun, 14 Jun 2015 16:41:20 -0700

Hello,

Is there a way in spark, where I define the data source (say the JDBC
Source) and define the list of tables to be used on that data source. Like
JDBC connection, where we define the connection and run execute statement
based on that connection. In current external table implementation, each
table requires complete data source information (like url, etc).


The use case is something like I have n tables on database1 and m tables on
database2; when there is a complex query that combines both m,n tables. Can
spark sql decompose the complex query into data source specific queries say
source 1 query (with n tables) executed on database-1 and source 2 query
(with m tables) executed on database-2 and the source 1 & source 2 query
result is joined/merged in spark layer to produce the final output? Will
pushdown optimization work at the data source level or at the separate
external table level?


Thanks

Sathish

Spark SQL - Complex query pushdown

Reply via email to