CookiePieWw commented on issue #15976: URL: https://github.com/apache/datafusion/issues/15976#issuecomment-2920132245
Hi :) I've spent some time on this and found the problem in `get_col_stats` https://github.com/apache/datafusion/blob/2c2f225926958b6abf06b01fcfb594017531043c/datafusion/datasource-parquet/src/file_format.rs#L1101-L1107 Here we always use `Precision::Exact` to wrap the stats, but actually we need to respect the `is_max_value_exact` and `is_min_value_exact` flags in the column metadata. The max and min values are extracted at https://github.com/apache/datafusion/blob/2c2f225926958b6abf06b01fcfb594017531043c/datafusion/datasource-parquet/src/file_format.rs#L1112-L1139 But I didn't find a method to access the `..exact` flags in `StatisticsConverter`, so my plan is to first add a function similar to `row_group_mins` to the converter to extract the flags, which requires a change to `arrow-rs` first, and then collect and pass the extracted boolean array of flags to `get_col_stats` to decide which one to use, `Precision::Exact` and `Precision::InExact`. Please let me know if this direction makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org