[PR] Add PyCapsule Protocol support for Arrow inputs [datafusion-python]

via GitHub Sat, 18 Oct 2025 11:36:24 -0700


H0TB0X420 opened a new pull request, #1265:
URL: https://github.com/apache/datafusion-python/pull/1265


   # Which issue does this PR close?
   
   Closes #1227
   
   # Rationale for this change
   (From original issue)
   PyArrow is a massive dependency (>100MB unpacked) and the only required 
dependency for datafusion-python. Many Python Arrow libraries implement the 
PyCapsule Interface, allowing users to choose lightweight alternatives like 
nanoarrow (~7MB), arro3, or pass data directly from Polars, DuckDB, etc.
   
   This PR implements the first phase of making PyArrow optional by updating 
input parameters to accept any Arrow-compatible library via the PyCapsule 
Interface.
   
   # What changes are included in this PR?
   
   - Add Protocol types for Arrow PyCapsule Interface `ArrowSchemaExportable`
   - Update schema parameters in `register_csv`, `register_parquet`, 
`register_json`, `register_avro`, `register_listing_table`, and read methods to 
accept `ArrowSchemaExportable`
   - Move pyarrow import to `TYPE_CHECKING` block (optional at runtime for type 
hints only)
   
   **Note:** This PR covers input parameters only. Return types (ToPyArrow 
conversions) still reference pyarrow and will be addressed in a follow-up PR.
   
   # Are there any user-facing changes?
   
   **Breaking changes:** None. All existing PyArrow usage continues to work.
   
   **New functionality:** Users can now pass Arrow schemas from any library 
implementing `__arrow_c_schema__()` (nanoarrow, arro3, Polars, DuckDB, etc.) to 
datafusion methods.
   
   **Type hints:** Schema parameters now show `ArrowSchemaExportable | None` 
instead of `pa.Schema | None`, but accept both.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Add PyCapsule Protocol support for Arrow inputs [datafusion-python]

Reply via email to