Hi Jorge, I think this would certainly be a valuable contribution. How were you thinking of hosting (which repo)/publishing it (maintaintaining a separate wheel)? Also did you have thoughts integration testing with pyarrow?
Cheers, Micah On Sun, Apr 25, 2021 at 9:13 AM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Hi, > > I fielded a PR [1] to open up a discussion to incorporate python-datafusion > [2] into the Apache Arrow project. > > Python-datafusion is a Python library [3] built on top of DataFusions that > enables people to use DataFusion from Python. It leverages the C data > interface for zero-cost copy between DataFusion and pyarrow (a bunch of > pointers is shared around). > > For example, it allows users to read a CSV from Rust, pass the arrays to a > C++ kernel, continue the computation in Rust's kernels, and export to > parquet using Rust (or C++ parquet, or whatever ^_^). It supports UDFs and > UDAFs, in case someone wants to go crazy with Pyarrow, Pandas, numpy or > tensorflow. =) > > Best, > Jorge > > [1] https://github.com/apache/arrow-datafusion/pull/69 > [2] https://github.com/jorgecarleitao/datafusion-python > [3] https://pypi.org/project/datafusion/ >