Implementing compute kernels that depend on hashing has raised a couple of
edge cases that are worth discussing.  In particular
the following points need to be resolved (I opened a JIRA [1] to track the
fixes).  In particular:

1. How to handle -0.0 and 0.0?
-  Option 1: Collapse to a single value (this is more inline with ieee-754
spec I believe)
- Option 2: Keep them as separate values (I believe this is how java
handles them)
2. How handle NaN?
- Option 1: Do nothing with them (multiple values of NaN might occur in
hashtables)
- Option 2: Canonicalize to a single NaN (this is what java does)

I haven't investigated how DB systems handle these (if anyone knows and can
chime in I would appreciate it).  As a default, I think it might be nice to
align the C++ implementation with the way Java handles them, but I don't
have any strong opinions.

Thanks,
Micah

[1] https://issues.apache.org/jira/browse/ARROW-4497

Reply via email to