orlp commented on PR #514:
URL: https://github.com/apache/parquet-format/pull/514#issuecomment-3183221342

   > I seem to remember from a previous discussion around the need to have 
signed NaNs to allow for predicate pushdown, but I can't remember the example.
   
   Signed NaN counts are only useful to avoid treating a `-NaN` as a false 
positive for the `col > c` case for total ordering engines. To give a concrete 
example, if the page contains `[-NaN, -3, 42]` and the predicate is `col > 100` 
then this page could be skipped by such an engine with statistics `total_min = 
-NaN, total_max = 42` but not with `min = -3, max = 42, nan_count = 1`, because 
that singular nan could be positive NaN which would match `+NaN > 100`. Thus if 
signed nan counts are unavailable *any* NaN regardless of sign will have to be 
treated as an extremal value by those engines.
   
   **But only having a single NaN count doesn't block predicate pushdown 
whatsoever**, and since you're fine with NaN poisoning anyway I don't think 
you'd care about the above `-NaN` edge case.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to