JFinis commented on PR #221: URL: https://github.com/apache/parquet-format/pull/221#issuecomment-2943678700
> There is actually a problem with the singular NaN count for data systems which use IEEE 754 total ordering (such as datafusion), they would need two counts for efficient page filtering in the face of NaNs: one for positive NaNs and one for negative NaNs. I don't think that's a big problem. It just means that if the system needs to include either -NaN or +NaN in a query, any page that has a non-zero `nan_count` has to be scanned. Yes, that might mean that you scan a page in vain, if you're only looking for, say, +NaN, but the page happens to only include -NaN, but this seems to be a rather small problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
