[Discuss][C++] Hashing floating point numbers

Micah Kornfield Tue, 26 Feb 2019 07:22:30 -0800

If I understand your solution case 2 there are multiple underlying bit
values that are all interpreted as NaN


On Tuesday, February 26, 2019, Ravindra Pindikura <ravin...@dremio.com>
wrote:

>
>
> > On Feb 26, 2019, at 10:32 AM, Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> >
> > Implementing compute kernels that depend on hashing has raised a couple
> of
> > edge cases that are worth discussing.  In particular
> > the following points need to be resolved (I opened a JIRA [1] to track
> the
> > fixes).  In particular:
> >
> > 1. How to handle -0.0 and 0.0?
> > -  Option 1: Collapse to a single value (this is more inline with
> ieee-754
> > spec I believe)
> > - Option 2: Keep them as separate values (I believe this is how java
> > handles them)
> > 2. How handle NaN?
> > - Option 1: Do nothing with them (multiple values of NaN might occur in
> > hashtables)
> > - Option 2: Canonicalize to a single NaN (this is what java does)
> >
> > I haven't investigated how DB systems handle these (if anyone knows and
> can
> > chime in I would appreciate it).  As a default, I think it might be nice
> to
> > align the C++ implementation with the way Java handles them, but I don't
> > have any strong opinions.
>
> I’m probably missing something obvious. But, why not use the raw
> 4-byte/8-byte value underneath (treat it as uint32/uint64) for the hashing
> ? I’m assuming that will give 1 -> option 2, and 2 -> Option 2.
>
>
> >
> > Thanks,
> > Micah
> >
> > [1] https://issues.apache.org/jira/browse/ARROW-4497
>
>

[Discuss][C++] Hashing floating point numbers

Reply via email to