Hi Ishan,

I do not think there is an option to specify compute expression
with substrait at the moment.
There is a future plan to get it supported in the C++:
https://github.com/apache/arrow/issues/33985 and after we could bind it in
Python and we could use that functionality in PyArrow also.

Best,
Alenka

On Sun, Feb 26, 2023 at 9:26 AM Ishan Anand <anand.is...@outlook.com> wrote:

> Hi - I am working with a storage format meant to be PyArrow Dataset
> compatible. PyArrow datasets support specifying a filter, written using
> pyarrow.compute expressions, - [link](
> https://arrow.apache.org/docs/dev/python/generated/pyarrow.dataset.Expression.html
> ).
>
> Does the pyarrow API provide a mechanism to serialize compute expressions
> to a standard format like substrait? I want to analyze the filter
> expression, and push down some of its execution to the storage engine.
>
> Note that casting the filter expression to a string and parsing it is an
> option, but things like the isin​ operator don't produce easy to parse
> strings.
>
> ```py
> x = pc.field("colA")
> z = (x > 3) & x.isin([10, 11])
> str(z)
>
> #  '((colA > 3) and is_in(colA, {value_set=int64:[\n  10,\n  11\n],
> skip_nulls=false}))'
> ```
>
> Thank you,
> Ishan
>
>

Reply via email to