If I understand your solution case 2 there are multiple underlying bit values that are all interpreted as NaN
On Tuesday, February 26, 2019, Ravindra Pindikura <ravin...@dremio.com> wrote: > > > > On Feb 26, 2019, at 10:32 AM, Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > > Implementing compute kernels that depend on hashing has raised a couple > of > > edge cases that are worth discussing. In particular > > the following points need to be resolved (I opened a JIRA [1] to track > the > > fixes). In particular: > > > > 1. How to handle -0.0 and 0.0? > > - Option 1: Collapse to a single value (this is more inline with > ieee-754 > > spec I believe) > > - Option 2: Keep them as separate values (I believe this is how java > > handles them) > > 2. How handle NaN? > > - Option 1: Do nothing with them (multiple values of NaN might occur in > > hashtables) > > - Option 2: Canonicalize to a single NaN (this is what java does) > > > > I haven't investigated how DB systems handle these (if anyone knows and > can > > chime in I would appreciate it). As a default, I think it might be nice > to > > align the C++ implementation with the way Java handles them, but I don't > > have any strong opinions. > > I’m probably missing something obvious. But, why not use the raw > 4-byte/8-byte value underneath (treat it as uint32/uint64) for the hashing > ? I’m assuming that will give 1 -> option 2, and 2 -> Option 2. > > > > > > Thanks, > > Micah > > > > [1] https://issues.apache.org/jira/browse/ARROW-4497 > >