Actually I think I described it backwards. This would be to convert a data
fusion push down filter into an Arrow dataset expression, using substrait
as the intermediate representation.

On Mon, Mar 7, 2022 at 11:52 Weston Pace <weston.p...@gmail.com> wrote:

> > but will likely also need a method on PyArrow compute expressions to
> convert
> > to a Substrait expression.
>
> There is a C++ method to do this (one of the arrow::engine::ToProto
> overloads takes in arrow::compute::Expression and returns
> substrait::Expression) but at the moment the method is internal as we
> completely hide the Substrait/protobuf bindings (e.g. you just
> opaquely go from bytes to Arrow execution plan and back).  Can you
> describe a bit more what you'd want to accomplish with a Substrait
> expression in python?
>
> On Mon, Mar 7, 2022 at 8:16 AM Will Jones <will.jones...@gmail.com> wrote:
> >
> > Thanks for starting that, Andy!
> >
> > > I also think it could be helpful with in-memory language
> interoperability,
> > > such as passing query plans between Python and Rust.
> >
> > Yes! I prototyped a datafusion-python and pyarrow datasets
> integration[1] a
> > few weeks ago that could really benefit from this. I'll have to look into
> > it more,
> > but will likely also need a method on PyArrow compute expressions to
> convert
> > to a Substrait expression.
> >
> > [1] https://github.com/datafusion-contrib/datafusion-python/pull/21
> >
> > On Mon, Mar 7, 2022 at 8:40 AM Wang Xudong <wxd963996...@gmail.com>
> wrote:
> >
> > > Thank you!
> > > This is a great idea, I'll try to contribute some code when I have
> time!
> > >
> > > ---
> > > xudong
> > >
> > > Gavin Ray <ray.gavi...@gmail.com> 于2022年3月8日周二 00:36写道:
> > >
> > > > Incredibly exciting! Following along eagerly =)
> > > >
> > > > On Mon, Mar 7, 2022 at 11:31 AM Andy Grove <andygrov...@gmail.com>
> > > wrote:
> > > >
> > > > > I created a new repo in the datafusion-contrib GitHub org over the
> > > > weekend
> > > > > with a starting point for supporting DataFusion as both a producer
> and
> > > > > consumer of Substrait plans.
> > > > >
> > > > > https://github.com/datafusion-contrib/datafusion-substrait
> > > > >
> > > > > I am hopeful that we can eventually use Substrait in Ballista as a
> > > > > replacement for the current query plan protobuf format, meaning
> that
> > > the
> > > > > Ballista scheduler could potentially be used with engines other
> than
> > > > > DataFusion.
> > > > >
> > > > > I also think it could be helpful with in-memory language
> > > > interoperability,
> > > > > such as passing query plans between Python and Rust.
> > > > >
> > > > > I plan on continuing to merge my own PRs here as I flesh out more
> of
> > > > this,
> > > > > at least until there are other contributors.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Andy.
> > > > >
> > > >
> > >
>

Reply via email to