orlp commented on PR #514: URL: https://github.com/apache/parquet-format/pull/514#issuecomment-3183221342
> I seem to remember from a previous discussion around the need to have signed NaNs to allow for predicate pushdown, but I can't remember the example. Signed NaN counts are only useful to avoid treating a `-NaN` as a false positive for the `col > c` case for total ordering engines. To give a concrete example, if the page contains `[-NaN, -3, 42]` and the predicate is `col > 100` then this page could be skipped by such an engine with statistics `total_min = -NaN, total_max = 42` but not with `min = -3, max = 42, nan_count = 1`, because that singular nan could be positive NaN which would match `+NaN > 100`. Thus if signed nan counts are unavailable *any* NaN regardless of sign will have to be treated as an extremal value by those engines. **But only having a single NaN count doesn't block predicate pushdown whatsoever**, and since you're fine with NaN poisoning anyway I don't think you'd care about the above `-NaN` edge case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
