Re: Integration between ibis-substrait and Acero

Li Jin Wed, 05 Oct 2022 11:33:29 -0700

Hmm. Thanks for the update - Now I searched the code more, it seems perhaps
I should be using "compile" rather than "translate";


https://github.com/ibis-project/ibis-substrait/blob/main/ibis_substrait/compiler/core.py#L82

Let me try some more

On Wed, Oct 5, 2022 at 1:42 PM Will Jones <will.jones...@gmail.com> wrote:

> Hi Li Jin,
>
> The original segfault seems to occur because you are passing a Python bytes
> object and not a PyArrow Buffer object. You can wrap the bytes object using
> pa.py_buffer():
>
> pa.substrait.run_query(pa.py_buffer(result_bytes), table_provider)
>
>
> That being said, when I run your full example with that, we now get a
> different error similar to what you get when you pass in through JSON:
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "pyarrow/_substrait.pyx", line 140, in pyarrow._substrait.run_query
>     c_reader = GetResultValue(c_res_reader)
>   File "pyarrow/error.pxi", line 144, in
> pyarrow.lib.pyarrow_internal_check_status
>     return check_status(status)
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
>     raise ArrowInvalid(message)
> pyarrow.lib.ArrowInvalid: ExecPlan has no node
>
> /Users/willjones/Documents/arrows/arrow/cpp/src/arrow/engine/substrait/util.cc:82
>  plan_->Validate()
>
> /Users/willjones/Documents/arrows/arrow/cpp/src/arrow/engine/substrait/util.cc:131
>  executor.Execute()
>
>
> We get the same error even if I add operations onto the plan:
>
> result = translate(t.group_by("a").mutate(z = t.b.sum()), compiler)
> print(result)
>
>
> project {
>   input {
>     read {
>       base_schema {
>         names: "a"
>         names: "b"
>         struct {
>           types {
>             i64 {
>               nullability: NULLABILITY_NULLABLE
>             }
>           }
>           types {
>             i64 {
>               nullability: NULLABILITY_NULLABLE
>             }
>           }
>           nullability: NULLABILITY_REQUIRED
>         }
>       }
>       named_table {
>         names: "table0"
>       }
>     }
>   }
>   expressions {
>     selection {
>       direct_reference {
>         struct_field {
>         }
>       }
>       root_reference {
>       }
>     }
>   }
>   expressions {
>     selection {
>       direct_reference {
>         struct_field {
>           field: 1
>         }
>       }
>       root_reference {
>       }
>     }
>   }
>   expressions {
>     window_function {
>       function_reference: 1
>       partitions {
>         selection {
>           direct_reference {
>             struct_field {
>             }
>           }
>           root_reference {
>           }
>         }
>       }
>       upper_bound {
>         unbounded {
>         }
>       }
>       lower_bound {
>         unbounded {
>         }
>       }
>       phase: AGGREGATION_PHASE_INITIAL_TO_RESULT
>       output_type {
>         i64 {
>           nullability: NULLABILITY_NULLABLE
>         }
>       }
>       arguments {
>         value {
>           selection {
>             direct_reference {
>               struct_field {
>                 field: 1
>               }
>             }
>             root_reference {
>             }
>           }
>         }
>       }
>     }
>   }
> }
>
>
> Full reproduction:
>
> import pyarrow as pa
> import pyarrow.substrait
> import ibis
> from ibis_substrait.compiler.core import SubstraitCompiler
> from ibis_substrait.compiler.translate import translate
>
>
> compiler = SubstraitCompiler()
>
>
> t = ibis.table([("a", "int64"), ("b", "int64")], name="table0")
> result = translate(t.group_by("a").mutate(z = t.b.sum()), compiler)
>
> def table_provider(names):
>     if not names:
>         raise Exception("No names provided")
>     elif names[0] == 'table0':
>         return test_table_0
>     else:
>         raise Exception(f"Unknown table name {names}")
>
>
> test_table_0 = pa.Table.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
>
> result_bytes = result.SerializeToString()
>
> pa.substrait.run_query(pa.py_buffer(result_bytes), table_provider)
>
> Best,
>
> Will Jones
>
> On Tue, Oct 4, 2022 at 12:30 PM Li Jin <ice.xell...@gmail.com> wrote:
>
> > For reference, this is the "relations" entry that I was referring to:
> >
> >
> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_substrait.py#L186
> >
> > On Tue, Oct 4, 2022 at 3:28 PM Li Jin <ice.xell...@gmail.com> wrote:
> >
> > > So I made some progress with updated code:
> > >
> > >         t = ibis.table([("a", "int64"), ("b", "int64")], name="table0")
> > >
> > >         test_table_0 = pa.Table.from_pydict({"a": [1, 2, 3], "b": [4,
> 5,
> > > 6]})
> > >
> > >
> > >
> > >         result = translate(t, self.compiler)
> > >
> > >
> > >
> > >         def table_provider(names):
> > >
> > >             if not names:
> > >
> > >                 raise Exception("No names provided")
> > >
> > >             elif names[0] == 'table0':
> > >
> > >                 return test_table_0
> > >
> > >             else:
> > >
> > >                 raise Exception(f"Unknown table name {names}")
> > >
> > >
> > >
> > >         print(result)
> > >
> > >         result_buf =
> > > pa._substrait._parse_json_plan(tobytes(MessageToJson(result)))
> > >
> > >
> > >
> > >         pa.substrait.run_query(result_buf, table_provider)
> > >
> > > I think now the plan is passed properly and I got a "ArrowInvalid:
> Empty
> > > substrait plan is passed"
> > >
> > >
> > > Looking the plan reproduces by ibis-substrait, it looks like doesn't
> > match
> > > the expected format of Acero consumer. In particular, it looks like the
> > > plan produced by ibis-substrait doesn't have a "relations" entry - any
> > > thoughts on how this can be fixed? (I don't know if I am using the API
> > > wrong or some format inconsistency between the two)
> > >
> > > On Tue, Oct 4, 2022 at 1:54 PM Li Jin <ice.xell...@gmail.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> I am testing integration between ibis-substrait and Acero but hit a
> > >> segmentation fault. I think this might be cause the way I am
> > >> integrating these two libraries are wrong, here is my code:
> > >>
> > >> Li Jin
> > >> 1:51 PM (1 minute ago)
> > >> to me
> > >>
> > >> class BasicTests(unittest.TestCase):
> > >>
> > >>     """Test basic features"""
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>     @classmethod
> > >>
> > >>     def setUpClass(cls):
> > >>
> > >>         cls.compiler = SubstraitCompiler()
> > >>
> > >>
> > >>
> > >>     def test_named_table(self):
> > >>
> > >>         """Test basic"""
> > >>
> > >>         t = ibis.table([("a", "int64"), ("b", "int64")],
> name="table0")
> > >>
> > >>         result = translate(t, self.compiler)
> > >>
> > >>
> > >>
> > >>         def table_provider(names):
> > >>
> > >>             if not names:
> > >>
> > >>                 raise Exception("No names provided")
> > >>
> > >>             elif names[0] == 'table0':
> > >>
> > >>                 return test_table_0
> > >>
> > >>             else:
> > >>
> > >>                 raise Exception(f"Unknown table name {names}")
> > >>
> > >>
> > >>
> > >>         test_table_0 = pa.Table.from_pydict({"a": [1, 2, 3], "b": [4,
> 5,
> > >> 6]})
> > >>
> > >>
> > >>
> > >>         print(type(result))
> > >>
> > >>         print(result)
> > >>
> > >>         result_bytes = result.SerializeToString()
> > >>
> > >>
> > >>
> > >>         pa.substrait.run_query(result_bytes, table_provider)
> > >>
> > >>
> > >> I wonder if someone has tried integration between these two before and
> > >> can share some working code?
> > >>
> > >
> >
>

Re: Integration between ibis-substrait and Acero

Reply via email to