+1 from me. Thanks for driving this discussion so we have the rationale documented
On Tue, Mar 5, 2019 at 12:16 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > > OK to summarize my understanding of the thoughts expressed: > 1. People really shouldn't be trying to do things like grouping and > joining on double valued columns (but they do). > 2. The consensus (but not 100% agreement) : > *Canonicalize NaNs and assume NaN == NaN, for group by/unique kernels > * assume -0.0 == 0.0. > > I can update the JIRA with these conclusions unless someone strongly > disagrees. > > Thanks, > Micah > > On Tue, Feb 26, 2019 at 11:54 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > In an analytics setting my prior is that -0/+0 and all types of NaNs > > should respectively be considered semantically to all be "the same > > value". It would be confusing (and likely "wrong" in a practical > > setting) to obtain two kinds of zeros as the output of an algorithm > > involving a hash table, like Unique or ValueCounts. However: hashing > > of floats should not be encouraged in general, but sometimes people > > will hash the results of some operation that happens to yield floats. > > > > On Tue, Feb 26, 2019 at 1:49 PM Antoine Pitrou <solip...@pitrou.net> > > wrote: > > > > > > On Tue, 26 Feb 2019 09:59:54 -0800 > > > Tim Armstrong <tarmstr...@cloudera.com.INVALID> wrote: > > > > It's not a database thing, it's a floating point > > > > number thing. If you're doing floating point arithmetic you can end up > > > > with -0/+0 from expressions that should be equivalent. > > > > > > But we are not exactly dealing with arithmetic here... I'm not sure > > > the IEEE FP standard was designed with database joins in mind. > > > > > > Granted, float hashing and float equality may be of dubious utility. > > > I'm curious about the use cases. > > > > > > > You end up in a world of pain if your equality relation and your hash > > > > function implementation are not aligned. > > > > > > This is not what I am suggesting. > > > > > > > So it's really a question of how you want to define equality (and > > whether > > > > you want to have multiple definitions of equality for different > > purposes). > > > > > > I think this is the goal of this discussion. > > > > > > Regards > > > > > > Antoine. > > > > > > > >