In an analytics setting my prior is that -0/+0 and all types of NaNs should respectively be considered semantically to all be "the same value". It would be confusing (and likely "wrong" in a practical setting) to obtain two kinds of zeros as the output of an algorithm involving a hash table, like Unique or ValueCounts. However: hashing of floats should not be encouraged in general, but sometimes people will hash the results of some operation that happens to yield floats.
On Tue, Feb 26, 2019 at 1:49 PM Antoine Pitrou <solip...@pitrou.net> wrote: > > On Tue, 26 Feb 2019 09:59:54 -0800 > Tim Armstrong <tarmstr...@cloudera.com.INVALID> wrote: > > It's not a database thing, it's a floating point > > number thing. If you're doing floating point arithmetic you can end up > > with -0/+0 from expressions that should be equivalent. > > But we are not exactly dealing with arithmetic here... I'm not sure > the IEEE FP standard was designed with database joins in mind. > > Granted, float hashing and float equality may be of dubious utility. > I'm curious about the use cases. > > > You end up in a world of pain if your equality relation and your hash > > function implementation are not aligned. > > This is not what I am suggesting. > > > So it's really a question of how you want to define equality (and whether > > you want to have multiple definitions of equality for different purposes). > > I think this is the goal of this discussion. > > Regards > > Antoine. > >