+1 from me. Thanks for driving this discussion so we have the
rationale documented

On Tue, Mar 5, 2019 at 12:16 AM Micah Kornfield <emkornfi...@gmail.com> wrote:
>
> OK to summarize my understanding of the thoughts expressed:
> 1.  People really shouldn't be trying to do things like grouping and
> joining on double valued columns (but they do).
> 2.  The consensus (but not 100% agreement) :
>    *Canonicalize NaNs and assume NaN == NaN, for group by/unique kernels
>    * assume -0.0 == 0.0.
>
> I can update the JIRA with these conclusions unless someone strongly
> disagrees.
>
> Thanks,
> Micah
>
> On Tue, Feb 26, 2019 at 11:54 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > In an analytics setting my prior is that -0/+0 and all types of NaNs
> > should respectively be considered semantically to all be "the same
> > value". It would be confusing (and likely "wrong" in a practical
> > setting) to obtain two kinds of zeros as the output of an algorithm
> > involving a hash table, like Unique or ValueCounts. However: hashing
> > of floats should not be encouraged in general, but sometimes people
> > will hash the results of some operation that happens to yield floats.
> >
> > On Tue, Feb 26, 2019 at 1:49 PM Antoine Pitrou <solip...@pitrou.net>
> > wrote:
> > >
> > > On Tue, 26 Feb 2019 09:59:54 -0800
> > > Tim Armstrong <tarmstr...@cloudera.com.INVALID> wrote:
> > > > It's not a database thing, it's a floating point
> > > > number thing. If you're doing floating point arithmetic you can end up
> > > > with -0/+0 from expressions that should be equivalent.
> > >
> > > But we are not exactly dealing with arithmetic here...  I'm not sure
> > > the IEEE FP standard was designed with database joins in mind.
> > >
> > > Granted, float hashing and float equality may be of dubious utility.
> > > I'm curious about the use cases.
> > >
> > > > You end up in a world of pain if your equality relation and your hash
> > > > function implementation are not aligned.
> > >
> > > This is not what I am suggesting.
> > >
> > > > So it's really a question of how you want to define equality (and
> > whether
> > > > you want to have multiple definitions of equality for different
> > purposes).
> > >
> > > I think this is the goal of this discussion.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> >

Reply via email to