Hi,
I just posted some stuff regarding using Spark with Oracle, If you want to
do distributed processing like any DW of your choice be Oracle , Hive or
BigQuery, best in my experience to create Spark dataframes on top of the
underlying storage.either through JDBC or Spark API (Hive or BigQuery).
I have been developing 'Spark on Oracle', a project to provide better
integration of Spark into an Oracle Data Warehouse. You can read about it
at https://hbutani.github.io/spark-on-oracle/blog/Spark_on_Oracle_Blog.html
The key features are Catalog Integration, translation and pushdown of Spark
SQ