Re: [PROPOSAL] ADBC Provider for Apache Airflow

Jarek Potiuk Tue, 03 Mar 2026 08:09:54 -0800

Big +1. Arrow and ADBC are gaining significant traction recently and
enormously speed up data processing. Those who use Arrow and it's in-memory
columnar storage to access analytical data, get often multiple orders of
magnitude improvements. Also this shows the power. of our "common"
abstractions  - where you can just drop in the ADBC Hook instead of
whatever "regular" driver you used before and it should **just work** (only
many times faster in a number of cases). And you will be able to do way
more using the very same hooks you already know how to use in your Dags.


BTW. We already have "apache-airflow-providers-apache-arrow" reserved in
PyPI. Arrow support was inevitable :).

On Tue, Mar 3, 2026 at 4:53 PM Blain David <[email protected]> wrote:

> Hello everyone,
>
> Following some initial discussions with Jarek Potiuk and a previously
> opened PR, I would like to formally propose the introduction of an Apache
> Arrow / ADBC provider for Airflow.
>
> Context & Motivation:
>
> While Airflow has a rich set of database-specific providers, the data
> ecosystem is rapidly shifting toward ADBC (Arrow Database Connectivity).
> ADBC solves many of the "bottleneck" issues associated with traditional
> DB-API 2.0, ODBC or JDBC drivers by leveraging columnar data access and
> Arrow-native memory representation.
>
> We are seeing significant momentum here:
>
>
>   *   Performance: Significant reduction in serialization overhead for
> bulk operations. While results vary by driver maturity and server-side
> native Arrow support (e.g., flight endpoints), ADBC provides a much higher
> performance ceiling than standard PEP 249 drivers.
>   *   Standardization: Systems like Snowflake, Apache DataFusion and
> DuckDB are increasingly treating Arrow as a first-class citizen.
>   *   Future-proofing: Tools like dbt-fusion and various lakehouse
> architectures are moving toward Arrow-based execution.
>
> The Proposal:
>
> I propose adding an apache-airflow-providers-apache-arrow (or similar)
> that introduces an AdbcHook.
>
> Key Technical Highlights:
>
>
>   *   Compatibility: By implementing DbApiHook, the AdbcHook will be
> immediately compatible with existing SQL operators.
>   *   Efficiency: It will offer a high-performance alternative to
> traditional row-based drivers without requiring users to rewrite their DAG
> logic.
>   *   Scope: Focus on providing a standardized interface for Arrow-native
> bulk reads and writes (future enhancement in AdbcHook).
>
> Community & Maintenance:
>
> I have already started the groundwork in a Draft PR (#52330).
>
> I believe this aligns with the project's goal of supporting
> high-performance data engineering patterns. I'm looking for feedback on:
>
>
>   *   Naming: Should this be a standalone adbc provider or part of an
> apache.arrow provider?  I chose the later but to be discussed.
>   *   Scope: At the moment I was only focusing purely on the
> Hook/Connection, as it extends the DbAPiHook and implements all required
> methods, it's already directly useable in SQL-operators.
>
> I'd love to gather your thoughts and gauge interest before moving to a
> formal voting thread.
>
> Draft PR: https://github.com/apache/airflow/pull/52330
>
> Best regards,
> David
>

Re: [PROPOSAL] ADBC Provider for Apache Airflow

Reply via email to