Hi Jorge,
I think this would certainly be a valuable contribution.  How were you
thinking of hosting (which repo)/publishing it (maintaintaining a separate
wheel)?  Also did you have thoughts integration testing with pyarrow?

Cheers,
Micah

On Sun, Apr 25, 2021 at 9:13 AM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Hi,
>
> I fielded a PR [1] to open up a discussion to incorporate python-datafusion
> [2] into the Apache Arrow project.
>
> Python-datafusion is a Python library [3] built on top of DataFusions that
> enables people to use DataFusion from Python. It leverages the C data
> interface for zero-cost copy between DataFusion and pyarrow (a bunch of
> pointers is shared around).
>
> For example, it allows users to read a CSV from Rust, pass the arrays to a
> C++ kernel, continue the computation in Rust's kernels, and export to
> parquet using Rust (or C++ parquet, or whatever ^_^). It supports UDFs and
> UDAFs, in case someone wants to go crazy with Pyarrow, Pandas, numpy or
> tensorflow. =)
>
> Best,
> Jorge
>
> [1] https://github.com/apache/arrow-datafusion/pull/69
> [2] https://github.com/jorgecarleitao/datafusion-python
> [3] https://pypi.org/project/datafusion/
>

Reply via email to