Hi Jorge, Awesome, I think this is a super valuable addition and makes DataFusion much more accessible / approachable for anyone wanting to experiment with DataFusion. Would be very cool to update it to the latest version and include it in the project.
Best, Daniël On Sun, Apr 25, 2021, 22:32 Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi Jorge, > I think this would certainly be a valuable contribution. How were you > thinking of hosting (which repo)/publishing it (maintaintaining a separate > wheel)? Also did you have thoughts integration testing with pyarrow? > > Cheers, > Micah > > On Sun, Apr 25, 2021 at 9:13 AM Jorge Cardoso Leitão < > jorgecarlei...@gmail.com> wrote: > > > Hi, > > > > I fielded a PR [1] to open up a discussion to incorporate > python-datafusion > > [2] into the Apache Arrow project. > > > > Python-datafusion is a Python library [3] built on top of DataFusions > that > > enables people to use DataFusion from Python. It leverages the C data > > interface for zero-cost copy between DataFusion and pyarrow (a bunch of > > pointers is shared around). > > > > For example, it allows users to read a CSV from Rust, pass the arrays to > a > > C++ kernel, continue the computation in Rust's kernels, and export to > > parquet using Rust (or C++ parquet, or whatever ^_^). It supports UDFs > and > > UDAFs, in case someone wants to go crazy with Pyarrow, Pandas, numpy or > > tensorflow. =) > > > > Best, > > Jorge > > > > [1] https://github.com/apache/arrow-datafusion/pull/69 > > [2] https://github.com/jorgecarleitao/datafusion-python > > [3] https://pypi.org/project/datafusion/ > >rer >