JFinis commented on PR #514:
URL: https://github.com/apache/parquet-format/pull/514#issuecomment-3183262288

   > I seem to remember from a previous discussion around the need to have 
signed NaNs to allow for predicate pushdown, but I can't remember the example.
   > 
   > Regardless my broader point still stands that we are adding a non-trivial 
amount of complexity, there is a lot of subtlety both at read and write time, 
to yield an improvement that to be brutally honest I am not really sure will 
benefit all that many real workloads. It's all tradeoffs, if people want to and 
are motivated to proceed with the hybrid approach I don't feel strongly, but I 
personally prefer the simpler option that is more likely to be implemented 
correctly across the many many parquet implementations. _The proposal here is 
strictly more complex than the existing specification which implementations 
still implement incorrectly._
   
   Gut feeling wise, I agree with your assessment. I feel like I personally can 
implement this correctly in our engine, but I have now spent years of my life 
thinking about this, so I guess I'm not representative for the average Parquet 
maintainer. I also feel that this has such a high chance of being implemented 
incorrectly that the more straightforward solution of just IEEE total ordering 
would be preferrable. My gut feels like the risk of added implementation 
complexity and thus chance for wrong implementations is higher than the 
advantage of not having "NaN poisoning" of statistics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to