ExecBatch in arrow execution engine

Yue Ni Mon, 09 May 2022 07:29:48 -0700

Hi there,

I would like to use apache arrow execution engine for some computation. I
found `ExecBatch` instead of `RecordBatch` is used for execution engine's
node, and I wonder how I can attach some additional information such as
schema/metadata for the `ExecBatch` during execution so that they can be
used by a custom ExecNode.


In my first use case, the computation flow looks like this:

scanner <===> custom filter node <===> query client

1) The scanner is a custom scanner that will load some data from disk, and
it accepts a pushed down custom filter expression (not the arrow filter
expression but a homebrewed filter expression), and the scanner will use
this custom filter expression to avoid loading data from disk as much as
possible but it may return a superset of matching data to the successor
nodes because the capability of pushed down filter.

2) And its successor node is a filter node, which will do some additional
filtering if needed. The scanner is aware that if a result batch retrieved
needs additional filtering or not, and I would like to make scanner pass
some batch specific metadata like "additional_filtering_required:
true/false" along with the batch to the filter node, but I cannot figure
out how this could be done for the `ExecBatch`.

In my other use case, I would like to attach a batch specific schema to
each batch returned by some nodes.

Basically, I wonder within the current framework, if there is any chance I
could attach some additional execution metadata/schema to the `ExecBatch`
so that they could be used by a custom exec node. Could you please help?
Thanks.

ExecBatch in arrow execution engine

Reply via email to