Actually I think I described it backwards. This would be to convert a data fusion push down filter into an Arrow dataset expression, using substrait as the intermediate representation.
On Mon, Mar 7, 2022 at 11:52 Weston Pace <weston.p...@gmail.com> wrote: > > but will likely also need a method on PyArrow compute expressions to > convert > > to a Substrait expression. > > There is a C++ method to do this (one of the arrow::engine::ToProto > overloads takes in arrow::compute::Expression and returns > substrait::Expression) but at the moment the method is internal as we > completely hide the Substrait/protobuf bindings (e.g. you just > opaquely go from bytes to Arrow execution plan and back). Can you > describe a bit more what you'd want to accomplish with a Substrait > expression in python? > > On Mon, Mar 7, 2022 at 8:16 AM Will Jones <will.jones...@gmail.com> wrote: > > > > Thanks for starting that, Andy! > > > > > I also think it could be helpful with in-memory language > interoperability, > > > such as passing query plans between Python and Rust. > > > > Yes! I prototyped a datafusion-python and pyarrow datasets > integration[1] a > > few weeks ago that could really benefit from this. I'll have to look into > > it more, > > but will likely also need a method on PyArrow compute expressions to > convert > > to a Substrait expression. > > > > [1] https://github.com/datafusion-contrib/datafusion-python/pull/21 > > > > On Mon, Mar 7, 2022 at 8:40 AM Wang Xudong <wxd963996...@gmail.com> > wrote: > > > > > Thank you! > > > This is a great idea, I'll try to contribute some code when I have > time! > > > > > > --- > > > xudong > > > > > > Gavin Ray <ray.gavi...@gmail.com> 于2022年3月8日周二 00:36写道: > > > > > > > Incredibly exciting! Following along eagerly =) > > > > > > > > On Mon, Mar 7, 2022 at 11:31 AM Andy Grove <andygrov...@gmail.com> > > > wrote: > > > > > > > > > I created a new repo in the datafusion-contrib GitHub org over the > > > > weekend > > > > > with a starting point for supporting DataFusion as both a producer > and > > > > > consumer of Substrait plans. > > > > > > > > > > https://github.com/datafusion-contrib/datafusion-substrait > > > > > > > > > > I am hopeful that we can eventually use Substrait in Ballista as a > > > > > replacement for the current query plan protobuf format, meaning > that > > > the > > > > > Ballista scheduler could potentially be used with engines other > than > > > > > DataFusion. > > > > > > > > > > I also think it could be helpful with in-memory language > > > > interoperability, > > > > > such as passing query plans between Python and Rust. > > > > > > > > > > I plan on continuing to merge my own PRs here as I flesh out more > of > > > > this, > > > > > at least until there are other contributors. > > > > > > > > > > Thanks, > > > > > > > > > > Andy. > > > > > > > > > > > > >