orlp commented on PR #514: URL: https://github.com/apache/parquet-format/pull/514#issuecomment-3183142450
> Am I understanding correctly that the change to add nan_counts means that NaNs are now excluded from the min/max statistics? I believe this creates a new problem for engines that use total ordering as the sign of the NaNs is not known, and therefore there are scenarios where predicates can't be pushed down where they could be in the absence of nan_counts. The predicates still can be pushed down. If your engine uses total ordering with predicate `col > c` you match a page when the statistics indicate `col_min > c || nan_count != 0`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
