alamb commented on code in PR #14685:
URL: https://github.com/apache/datafusion/pull/14685#discussion_r1971726551


##########
datafusion/core/src/datasource/physical_plan/file_scan_config.rs:
##########
@@ -345,6 +345,32 @@ impl FileScanConfig {
     /// Set the projection of the files
     pub fn with_projection(mut self, projection: Option<Vec<usize>>) -> Self {
         self.projection = projection;
+        self.with_updated_statistics()
+    }
+
+    // Update source statistics with the current projection data
+    fn with_updated_statistics(mut self) -> Self {
+        let max_projection_column = *self
+            .projection
+            .as_ref()
+            .and_then(|proj| proj.iter().max())
+            .unwrap_or(&0);
+
+        if max_projection_column
+            >= self.file_schema.fields().len() + 
self.table_partition_cols.len()
+        {
+            // we don't yet have enough information (file schema info or 
partition column info) to perform projection
+            return self;
+        }
+
+        let (
+            _projected_schema,
+            _constraints,
+            projected_statistics,
+            _projected_output_ordering,
+        ) = self.project();
+
+        self.source = self.source.with_statistics(projected_statistics);

Review Comment:
   I don't fully understand why the source would need projected statistics
   
   I am testing out if the issue is that the FileScanConfig is providing the 
wrong statistics (like maybe this line should be self.statistics rather than 
self.source.statistics
   
   
https://github.com/apache/datafusion/blob/1c54b38e4a4012fd8d1b4f48e2c3d6d35016bad0/datafusion/core/src/datasource/physical_plan/file_scan_config.rs#L233-L232



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to