Re: [Discuss][C++] Hashing floating point numbers

Wes McKinney Tue, 26 Feb 2019 11:54:46 -0800

In an analytics setting my prior is that -0/+0 and all types of NaNs
should respectively be considered semantically to all be "the same
value". It would be confusing (and likely "wrong" in a practical
setting) to obtain two kinds of zeros as the output of an algorithm
involving a hash table, like Unique or ValueCounts. However: hashing
of floats should not be encouraged in general, but sometimes people
will hash the results of some operation that happens to yield floats.


On Tue, Feb 26, 2019 at 1:49 PM Antoine Pitrou <solip...@pitrou.net> wrote:
>
> On Tue, 26 Feb 2019 09:59:54 -0800
> Tim Armstrong <tarmstr...@cloudera.com.INVALID> wrote:
> > It's not a database thing, it's a floating point
> > number thing. If you're doing floating point arithmetic you can end up
> > with -0/+0 from expressions that should be equivalent.
>
> But we are not exactly dealing with arithmetic here...  I'm not sure
> the IEEE FP standard was designed with database joins in mind.
>
> Granted, float hashing and float equality may be of dubious utility.
> I'm curious about the use cases.
>
> > You end up in a world of pain if your equality relation and your hash
> > function implementation are not aligned.
>
> This is not what I am suggesting.
>
> > So it's really a question of how you want to define equality (and whether
> > you want to have multiple definitions of equality for different purposes).
>
> I think this is the goal of this discussion.
>
> Regards
>
> Antoine.
>
>

Re: [Discuss][C++] Hashing floating point numbers

Reply via email to