blaginin commented on code in PR #14685: URL: https://github.com/apache/datafusion/pull/14685#discussion_r1974936267
########## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ########## @@ -345,6 +345,32 @@ impl FileScanConfig { /// Set the projection of the files pub fn with_projection(mut self, projection: Option<Vec<usize>>) -> Self { self.projection = projection; + self.with_updated_statistics() + } + + // Update source statistics with the current projection data + fn with_updated_statistics(mut self) -> Self { + let max_projection_column = *self + .projection + .as_ref() + .and_then(|proj| proj.iter().max()) + .unwrap_or(&0); + + if max_projection_column + >= self.file_schema.fields().len() + self.table_partition_cols.len() + { + // we don't yet have enough information (file schema info or partition column info) to perform projection + return self; + } + + let ( + _projected_schema, + _constraints, + projected_statistics, + _projected_output_ordering, + ) = self.project(); + + self.source = self.source.with_statistics(projected_statistics); Review Comment: That's a great idea! We can't use `self.source.statistics` directly, because statistics match projection we're applying - so I had to apply projection before (https://github.com/apache/datafusion/pull/14685/commits/89ed225dcbe97ce9e9d1245d12e05637e3637f35) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org