xudong963 commented on code in PR #15539:
URL: https://github.com/apache/datafusion/pull/15539#discussion_r2036278558


##########
datafusion/datasource/src/statistics.rs:
##########
@@ -410,23 +410,24 @@ pub async fn get_statistics_with_limit(
 }
 
 /// Generic function to compute statistics across multiple items that have 
statistics
-fn compute_summary_statistics<T, I>(
+/// If `items` is empty or all items don't have statistics, it returns `None`.

Review Comment:
   > If you don't have a schema, how can you even try to compute a statistics, 
I couldn't imagine also that
   
   For the `compute_summary_statistics` method, it does only summarize, it's 
not necessary to have the schema. 
   
   For the caller of `compute_summary_statistics`, such as 
`compute_file_group_statistics`, if `compute_summary_statistics` returns none, 
it doesn't need to do anything, because the default value of the statistics of 
FileGroup is None
   ```rust
   pub struct FileGroup {
       /// The files in this group
       files: Vec<PartitionedFile>,
       /// Optional statistics for the data across all files in the group
       statistics: Option<Arc<Statistics>>,
   }
   ```
   
   It is used as a base method, and its caller has the flexibility to treat it 
according to its return value, without restricting it too much (e.g., requiring 
a shcema)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to