Re: [Discuss][C++] Hashing floating point numbers

Antoine Pitrou Tue, 26 Feb 2019 05:48:24 -0800


Le 26/02/2019 à 14:31, Ravindra Pindikura a écrit :
> 
> 
>> On Feb 26, 2019, at 10:32 AM, Micah Kornfield <emkornfi...@gmail.com> wrote:
>>
>> Implementing compute kernels that depend on hashing has raised a couple of
>> edge cases that are worth discussing.  In particular
>> the following points need to be resolved (I opened a JIRA [1] to track the
>> fixes).  In particular:
>>
>> 1. How to handle -0.0 and 0.0?
>> -  Option 1: Collapse to a single value (this is more inline with ieee-754
>> spec I believe)
>> - Option 2: Keep them as separate values (I believe this is how java
>> handles them)
>> 2. How handle NaN?
>> - Option 1: Do nothing with them (multiple values of NaN might occur in
>> hashtables)
>> - Option 2: Canonicalize to a single NaN (this is what java does)
>>
>> I haven't investigated how DB systems handle these (if anyone knows and can
>> chime in I would appreciate it).  As a default, I think it might be nice to
>> align the C++ implementation with the way Java handles them, but I don't
>> have any strong opinions.
> 
> I’m probably missing something obvious. But, why not use the raw 
> 4-byte/8-byte value underneath (treat it as uint32/uint64) for the hashing ? 
> I’m assuming that will give 1 -> option 2, and 2 -> Option 2.


That would actually give option 1 in both cases, as there's an extremely
large number of different NaN representations.

But, yes, that's the easiest option.

Regards

Antoine.

Re: [Discuss][C++] Hashing floating point numbers

Reply via email to