vbarua opened a new pull request, #13127: URL: https://github.com/apache/datafusion/pull/13127
## Which issue does this PR close? Follows up from https://github.com/apache/datafusion/pull/12495 Closes https://github.com/apache/datafusion/issues/12347 ## Rationale for this change Substrait relations have an [emit_kind](https://github.com/substrait-io/substrait/blob/683f4179a058c2c99c04501b920a48ff372356ff/proto/substrait/algebra.proto#L15-L22) which is either Direct, in which case the default fields of the relation are output, or Emit, which enables precise control of the order and inclusion of fields. For example, given a relation with the following emit ```json "emit": { "outputMapping": [2, 0, 1] } ``` The output mapping indicates that from the default columns output from the relation, only the 2nd, 0th and 1st column should be output (in that order). DataFusion currently ignores the emit_kind field entirely when reading Substrait plans. ## What changes are included in this PR? This PR adds support for handling output mappings by treating them as DataFusion Projections that are layered on top of the default translation of the relation. The one exception to this is Substrait Project, for which special handling has been added to avoid creating a Projection on top of a Projection. ## Are these changes tested? Yes. Two new tests have been added to check the remap logic. Additionally, DataFusion currently includes output mappings when it produces Substrait Projects, so any test which roundtrips a Projection also serves as a test of this functionality. ## Are there any user-facing changes? Substrait plans generated by DataFusion prior to version 0.42 did not set the output mapping correctly for Substrait Projects (see https://github.com/apache/datafusion/pull/12495 for details). After these changes, attempting to consume Substrait plans generated before version 0.42 will not work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org