Spark on Oracle available as an Apache licensed open source repo

Harish Butani Thu, 13 Jan 2022 16:50:07 -0800

Spark on Oracle is now available as an open source Apache licensed github
repo <https://github.com/oracle/spark-oracle>. Build and deploy it as an
extension jar in your Spark clusters.


Use it to combine Apache Spark programs with data in your existing Oracle
databases without expensive data copying or query time data movement.

The core capability is Optimizer extensions that collapse SQL operator
sub-graphs to an OraScan that executes equivalent SQL in Oracle. Physical
plan parallelism
<https://github.com/oracle/spark-oracle/wiki/Query-Splitting>can be
controlled to split Spark tasks to operate on Oracle data block ranges, or
on resultset pages or on table partitions.

We pushdown large parts of Spark SQL to Oracle, for example 95 of 99 TPCDS
queries are completely pushed to Oracle.
<https://github.com/oracle/spark-oracle/wiki/TPCDS-Queries>

With Spark SQL macros
<https://github.com/oracle/spark-oracle/wiki/Spark_SQL_macros>  you can
write custom Spark UDFs that get translated and pushed as Oracle SQL
expressions.

With DML pushdown <https://github.com/oracle/spark-oracle/wiki/DML-Support>
inserts in Spark SQL get pushed as transactionally consistent
inserts/updates on Oracle tables.

See Quick Start Guide
<https://github.com/oracle/spark-oracle/wiki/Quick-Start-Guide>  on how to
set up an Oracle free tier ADW instance, load it with TPCDS data and try
out the Spark on Oracle Demo
<https://github.com/oracle/spark-oracle/wiki/Demo>  on your Spark cluster.

More  details can be found in our blog
<https://hbutani.github.io/blogs/blog/Spark_on_Oracle_Blog.html> and
the project
wiki. <https://github.com/oracle/spark-oracle/wiki>

regards,
Harish Butani

Spark on Oracle available as an Apache licensed open source repo

Reply via email to