Hi Micah,
We have run into some of these issues on Impala in various guises,
including hash tables and min/max stats in parquet. Treating +0/-0 as
indistinguishable for purposes of equality and grouping makes the most
sense and avoids most pitfalls.
NaN is messier. I don't think there's necessar
> My intuition would be to keep them as separate values. If you end up
with negative zeros it probably means something. But I'm not a database
expert.
I strongly disagree. It's not a database thing, it's a floating point
number thing. If you're doing floating point arithmetic you can end up with
I don't know the Arrow and parquet-cpp codebases but this is exactly what
we did in Impala to solve similar issues and we haven't had any performance
problems with it - it should get compiled to a single load/store on x86-64.
On Fri, May 17, 2019 at 12:22 PM Micah Kornfield
wrote:
> I recently r