xudong963 commented on code in PR #15289: URL: https://github.com/apache/datafusion/pull/15289#discussion_r2007912619
########## datafusion/datasource-parquet/src/file_format.rs: ########## @@ -839,9 +839,10 @@ pub fn statistics_from_parquet_meta_calc( total_byte_size += row_group_meta.total_byte_size() as usize; if !has_statistics { - row_group_meta.columns().iter().for_each(|column| { Review Comment: Because `even if a column in table_schema doesn't have the corresponding statistics in row_group_meta, it also may need to go summarize_min_max_null_counts to set statistics with null.`, I think it isn't worth trying. In fact, I have tried it, and it doesn't make much sense, they (row_group_meta.columns() and table_schema.fields())are not always aligned. ########## datafusion/datasource-parquet/src/file_format.rs: ########## @@ -839,9 +839,10 @@ pub fn statistics_from_parquet_meta_calc( total_byte_size += row_group_meta.total_byte_size() as usize; if !has_statistics { - row_group_meta.columns().iter().for_each(|column| { Review Comment: Yes, because `even if a column in table_schema doesn't have the corresponding statistics in row_group_meta, it also may need to go summarize_min_max_null_counts to set statistics with null.`, I think it isn't worth trying. In fact, I have tried it, and it doesn't make much sense, they (row_group_meta.columns() and table_schema.fields())are not always aligned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org