xudong963 commented on code in PR #15289:
URL: https://github.com/apache/datafusion/pull/15289#discussion_r2007191299


##########
datafusion/datasource-parquet/src/file_format.rs:
##########
@@ -839,9 +839,10 @@ pub fn statistics_from_parquet_meta_calc(
         total_byte_size += row_group_meta.total_byte_size() as usize;
 
         if !has_statistics {
-            row_group_meta.columns().iter().for_each(|column| {

Review Comment:
   The key issue is how `has_statistics` is assigned within the `for_each` 
loop. During iteration, the value of `has_statistics` is overwritten by each 
column's statistics status.
   
   ### Example Scenario
   
   Assume there are 2 columns with the following statistics status:
   
   - Column 1: Has statistics (statistics is Some)
   - Column 2: No statistics (statistics is None)
   
   Execution process:
   - Processing Column 1: has_statistics becomes true
   - Processing Column 2: has_statistics becomes false
   
   if the last column processed has no statistics, the final value of 
has_statistics would be false.
   
   if the goal is to check "whether any column has statistics", a more suitable 
approach would be to use `any`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to