Re: output_schema for ExecNode

2021-11-17 Thread Yue Ni
Hi Weston, > Only the minimal schema can be used for the actual compute operations while the rest is just carried along and processed by the user at the end > If all you are doing is custom iterative processing (as opposed to running traditional relational algebra operators) the scanner should be

Re: output_schema for ExecNode

2021-11-16 Thread Weston Pace
I think the ExecPlan itself would probably need some changes. Right now each node has an output schema. Most of the node implementations depend on this in some way or another. For example, a filter node binds the expression to the schema once at plan construction time. If the schema is variable

Re: output_schema for ExecNode

2021-11-15 Thread Yue Ni
> This schema evolution work has not been done in the scanner yet so if > you are interested you might want to look at ARROW-11003[1]. Thanks. I will keep an eye on it. > Or did you have a different use case for multiple schemas in mind that > doesn't quite fit the "promote to common schema" case

Re: output_schema for ExecNode

2021-11-15 Thread Weston Pace
> "The consumer of a Scan does not need to know how it is implemented, > only that a uniform API is provided to obtain the next RecordBatch > with a known schema.", I interpret this as `Scan` operator may > produce multiple RecordBatches, and each of them should have a known > schema, but next batc

output_schema for ExecNode

2021-11-14 Thread Yue Ni
Hi there, I am evaluating Apache Arrow C++ compute engine for my project, and wonder what the schema assumption is for execution operators in the compute engine. In my use case, multiple record batches for computation may have different schemas. I read the Apache Arrow Query Engine for C++ design