Implementing compute kernels that depend on hashing has raised a couple of edge cases that are worth discussing. In particular the following points need to be resolved (I opened a JIRA [1] to track the fixes). In particular:
1. How to handle -0.0 and 0.0? - Option 1: Collapse to a single value (this is more inline with ieee-754 spec I believe) - Option 2: Keep them as separate values (I believe this is how java handles them) 2. How handle NaN? - Option 1: Do nothing with them (multiple values of NaN might occur in hashtables) - Option 2: Canonicalize to a single NaN (this is what java does) I haven't investigated how DB systems handle these (if anyone knows and can chime in I would appreciate it). As a default, I think it might be nice to align the C++ implementation with the way Java handles them, but I don't have any strong opinions. Thanks, Micah [1] https://issues.apache.org/jira/browse/ARROW-4497