Hi Ishan, I do not think there is an option to specify compute expression with substrait at the moment. There is a future plan to get it supported in the C++: https://github.com/apache/arrow/issues/33985 and after we could bind it in Python and we could use that functionality in PyArrow also.
Best, Alenka On Sun, Feb 26, 2023 at 9:26 AM Ishan Anand <anand.is...@outlook.com> wrote: > Hi - I am working with a storage format meant to be PyArrow Dataset > compatible. PyArrow datasets support specifying a filter, written using > pyarrow.compute expressions, - [link]( > https://arrow.apache.org/docs/dev/python/generated/pyarrow.dataset.Expression.html > ). > > Does the pyarrow API provide a mechanism to serialize compute expressions > to a standard format like substrait? I want to analyze the filter > expression, and push down some of its execution to the storage engine. > > Note that casting the filter expression to a string and parsing it is an > option, but things like the isin operator don't produce easy to parse > strings. > > ```py > x = pc.field("colA") > z = (x > 3) & x.isin([10, 11]) > str(z) > > # '((colA > 3) and is_in(colA, {value_set=int64:[\n 10,\n 11\n], > skip_nulls=false}))' > ``` > > Thank you, > Ishan > >