CookiePieWw commented on issue #15976:
URL: https://github.com/apache/datafusion/issues/15976#issuecomment-2920132245

   Hi :) I've spent some time on this and found the problem in `get_col_stats`
   
https://github.com/apache/datafusion/blob/2c2f225926958b6abf06b01fcfb594017531043c/datafusion/datasource-parquet/src/file_format.rs#L1101-L1107
   Here we always use `Precision::Exact` to wrap the stats, but actually we 
need to respect the `is_max_value_exact` and `is_min_value_exact` flags in the 
column metadata.
   
   The max and min values are extracted at 
   
https://github.com/apache/datafusion/blob/2c2f225926958b6abf06b01fcfb594017531043c/datafusion/datasource-parquet/src/file_format.rs#L1112-L1139
   But I didn't find a method to access the `..exact` flags in 
`StatisticsConverter`, so my plan is to first add a function similar to 
`row_group_mins` to the converter to extract the flags, which requires a change 
to `arrow-rs` first, and then collect and pass the extracted boolean array of 
flags to `get_col_stats` to decide which one to use, `Precision::Exact` and 
`Precision::InExact`.
   
   Please let me know if this direction makes sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to