paleolimbot commented on issue #159: URL: https://github.com/apache/sedona-db/issues/159#issuecomment-3364129733
We're using the FFI in a few ways! datafusion-python table providers can be used in `sd.create_data_frame()`: https://github.com/apache/sedona-db/blob/a15844b4ff9c7e9416b8f1c1c07ad81d908a89cd/python/sedonadb/src/import_from.rs#L56-L64 ...and we use the FFI definitions to allow functions to at least in theory be defined in separate Python packages (I'd hoped to use this for geography out of the gate but we just built everything together for the first release): https://github.com/apache/sedona-db/blob/a15844b4ff9c7e9416b8f1c1c07ad81d908a89cd/rust/sedona/src/ffi.rs#L40-L58 In terms of the Python interface, we can't just it because we have our own `SessionContext` in Rust land with our own optimizer rules and everything all assembled together. The FFI from DataFusion isn't stable yet and it can't serialize all the types of expressions we need it to (notably: the non datafusion UDFs, if I remember correctly). We also want to avoid having two copies of DataFusion installed (us and datafusion-python have quite a large installed size). I do really like their `Expr`, though: https://github.com/apache/datafusion-python/blob/709c918ef810d7207f12c09b82c2e1b1c4ad8290/python/datafusion/expr.py#L342-L351 ...and if we can find a way to (optionally) leverage that I'd love to! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
