Re: [Discuss][C++] Hashing floating point numbers

2019-02-25 Thread Tim Armstrong
Hi Micah, We have run into some of these issues on Impala in various guises, including hash tables and min/max stats in parquet. Treating +0/-0 as indistinguishable for purposes of equality and grouping makes the most sense and avoids most pitfalls. NaN is messier. I don't think there's necessar

Re: [Discuss][C++] Hashing floating point numbers

2019-02-26 Thread Tim Armstrong
> My intuition would be to keep them as separate values. If you end up with negative zeros it probably means something. But I'm not a database expert. I strongly disagree. It's not a database thing, it's a floating point number thing. If you're doing floating point arithmetic you can end up with

Re: [DISCUSS][C++] Unaligned memory accesses (undefined behavior)

2019-05-17 Thread Tim Armstrong
I don't know the Arrow and parquet-cpp codebases but this is exactly what we did in Impala to solve similar issues and we haven't had any performance problems with it - it should get compiled to a single load/store on x86-64. On Fri, May 17, 2019 at 12:22 PM Micah Kornfield wrote: > I recently r