kosiew commented on code in PR #18998:
URL: https://github.com/apache/datafusion/pull/18998#discussion_r2576301787
##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -270,10 +233,12 @@ impl FilterCandidateBuilder {
.project(&required_indices_into_table_schema)?,
);
- let (schema_mapper, projection_into_file_schema) = self
- .schema_adapter_factory
- .create(Arc::clone(&projected_table_schema), self.table_schema)
- .map_schema(&self.file_schema)?;
+ // Compute the projection into the file schema by matching column names
+ let projection_into_file_schema: Vec<usize> = projected_table_schema
+ .fields()
+ .iter()
+ .filter_map(|f| self.file_schema.index_of(f.name()).ok())
+ .collect();
Review Comment:
This assumes predicate column indices refer to the table/logical schema (not
file schema), and that we always map those logical columns into file indices by
name.
That's true today because DefaultPhysicalExprAdapter rewrites expressions
but preserves logical column indices and we do name-based mapping here.
Do you think adding:
1. a regression test for the reorder case (table: (a,b,c), file: (b,c,a)),
ensuring both pruning and row-filtering work
2. an explicit assert and comment in [FilterCandidateBuilder::build]
verifying that each required logical idx resolves to a unique file index or is
intentionally missing and handled by adapter.
would prevent regression?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]