xudong963 commented on code in PR #15539: URL: https://github.com/apache/datafusion/pull/15539#discussion_r2036278558
########## datafusion/datasource/src/statistics.rs: ########## @@ -410,23 +410,24 @@ pub async fn get_statistics_with_limit( } /// Generic function to compute statistics across multiple items that have statistics -fn compute_summary_statistics<T, I>( +/// If `items` is empty or all items don't have statistics, it returns `None`. Review Comment: > If you don't have a schema, how can you even try to compute a statistics, I couldn't imagine also that For the `compute_summary_statistics` method, it does only summarize, it's not necessary to have the schema. For the caller of `compute_summary_statistics`, such as `compute_file_group_statistics`, if `compute_summary_statistics` returns none, it doesn't need to do anything, because the default value of the statistics of FileGroup is None ```rust pub struct FileGroup { /// The files in this group files: Vec<PartitionedFile>, /// Optional statistics for the data across all files in the group statistics: Option<Arc<Statistics>>, } ``` It is used as a base method, and its caller has the flexibility to treat it according to its return value, without restricting it too much (e.g., requiring a shcema) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org