[Rust] Arrow SQL Adapters/Connectors

Neville Dipale Sat, 26 Sep 2020 19:22:44 -0700

Hi Arrow developers

I would like to gauge the appetite for an Arrow SQL connector that:


* Reads and writes Arrow data to and from SQL databases
* Reads tables and queries into record batches, and writes batches to
tables (either append or overwrite)
* Leverages binary SQL formats where available (e.g. PostgreSQL format is
relatively easy and well-documented)
* Provides a batch interface that abstracts away the different database
semantics, and exposes a RecordBatchReader (
https://docs.rs/arrow/1.0.1/arrow/record_batch/trait.RecordBatchReader.html),
and perhaps a RecordBatchWriter
* Resides in the Rust repo as either an arrow::sql module (like arrow::csv,
arrow::json, arrow::ipc) or alternatively is a separate crate in the
workspace  (*arrow-sql*?)

I would be able to contribute a Postgres reader/writer as a start.
I could make this a separate crate, but to drive adoption I would prefer
this living in Arrow, also it can remain updated (sometimes we reorganise
modules and end up breaking dependencies).

Also, being developed next to DataFusion could allow DF to support SQL
databases, as this would be yet another datasource.

Some questions:
* Should such library support async, sync or both IO methods?
* Other than postgres, what other databases would be interesting? Here I'm
hoping that once we've established a suitable API, it could be easier to
natively support more database types.

Potential concerns:

* Sparse database support
It's a lot of effort to write database connectors, especially if starting
from scratch (unlike with say JDBC). What if we end up supporting 1 or 2
database servers?
Perhaps in that case we could keep the module without publishing it to
crates.io until we're happy with database support, or even its usage.

* Dependency bloat
We could feature-gate database types to reduce the number of dependencies
if one only wants certain DB connectors

* Why not use Java's JDBC adapter?
I already do this, but sometimes if working on a Rust project, creating a
separate JVM service solely to extract Arrow data is a lot of effort.
I also don't think it's currently possible to use the adapter to save Arrow
data in a database.

* What about Flight SQL extensions?
There have been discussions around creating Flight SQL extensions, and the
Rust SQL adapter could implement that and co-exist well.
>From a crate dependency, *arrow-flight* depends on *arrow*, so it could
also depend on this *arrow-sql* crate.

Please let me know what you think

Regards
Neville

[Rust] Arrow SQL Adapters/Connectors

Reply via email to