timsaucer opened a new pull request, #12920:
URL: https://github.com/apache/datafusion/pull/12920

   ## Which issue does this PR close?
   
   This is to address part of 
https://github.com/apache/datafusion-python/issues/823 downstream but may have 
wider application than just python.
   
   ## Rationale for this change
   
   This PR allows for registering table providers via a stable FFI. With this 
change it enables breaking the requirement for python providers to include all 
of datafusion-python and re-export it. With this change we can allow providers 
with different underlying datafusion versions to interoperate.
   
   ## What changes are included in this PR?
   
   Adds support for `TableProvider` via FFI. In order to support this, it also 
includes `ExecutionPlan`, `SessionConfig`, `PlanProperties`, and `TableType`. 
As this gets used more, I expect we will want to expose other features but this 
gives an initial first implementation that solves an immediate need.
   
   ## Are these changes tested?
   
   Some unit tests are provided. Additionally I did the following test:
   
   I created a separate crate with the contents of `datafusion/ffi` so that I 
can test it against different versions of DataFusion by modifying the 
dependencies in Cargo.toml. Then I used this crate to build a test 
implementation of `datafusion-python` against DataFusion 42.0.0. I adjusted the 
test crate and built a test implementation of `delta-rs` against DataFusion 
41.0.0. Then I registered the delta table in python against the session 
context. I was able to query the table with push down filters via this FFI 
interface even though the underlying DataFusion versions were different.
   
   Additionally I ran memory leak checks against the provided unit tests and 
against running in python.
   
   ## Are there any user-facing changes?
   
   This is not breaking, but a pure addition of a new `datafusion-ffi` library.
   
   
   ## Remaining Issues
   
   - [ ] There is some inconsistency between the usage of `ExportedXYZ` and 
just using the raw `FFI_XYZ`. We should make the usage consistent across all 
struct types.
   - [ ] Add documentation to explain the reasoning behind creating the data 
the way we do in the private data and foreign structs.
   - [ ] Add documentation to explain more clearly the delineation between the 
`ExportedXYZ` and `ForeignXYZ`. It would probably be good to have a use case 
since which is "foreign" and which is "exported" can be complicated during some 
of the function calls.
   - [ ] It would be *great* to demonstrate a C++ implementation linked against 
DataFusion rust. This might really open the doors for some implementations that 
are not feasible to convert to Rust.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to