Hi Weston,
> Only the minimal schema can be used for the actual compute operations
while the rest is just carried along and processed by the user at the end
> If all you are doing is custom iterative processing (as opposed to
running traditional relational algebra operators) the scanner should be
I think the ExecPlan itself would probably need some changes. Right
now each node has an output schema. Most of the node implementations
depend on this in some way or another.
For example, a filter node binds the expression to the schema once at
plan construction time. If the schema is variable
> This schema evolution work has not been done in the scanner yet so if
> you are interested you might want to look at ARROW-11003[1].
Thanks. I will keep an eye on it.
> Or did you have a different use case for multiple schemas in mind that
> doesn't quite fit the "promote to common schema" case
> "The consumer of a Scan does not need to know how it is implemented,
> only that a uniform API is provided to obtain the next RecordBatch
> with a known schema.", I interpret this as `Scan` operator may
> produce multiple RecordBatches, and each of them should have a known
> schema, but next batc
Hi there,
I am evaluating Apache Arrow C++ compute engine for my project, and wonder
what the schema assumption is for execution operators in the compute
engine.
In my use case, multiple record batches for computation may have different
schemas. I read the Apache Arrow Query Engine for C++ design