Le 26/02/2019 à 14:31, Ravindra Pindikura a écrit : > > >> On Feb 26, 2019, at 10:32 AM, Micah Kornfield <emkornfi...@gmail.com> wrote: >> >> Implementing compute kernels that depend on hashing has raised a couple of >> edge cases that are worth discussing. In particular >> the following points need to be resolved (I opened a JIRA [1] to track the >> fixes). In particular: >> >> 1. How to handle -0.0 and 0.0? >> - Option 1: Collapse to a single value (this is more inline with ieee-754 >> spec I believe) >> - Option 2: Keep them as separate values (I believe this is how java >> handles them) >> 2. How handle NaN? >> - Option 1: Do nothing with them (multiple values of NaN might occur in >> hashtables) >> - Option 2: Canonicalize to a single NaN (this is what java does) >> >> I haven't investigated how DB systems handle these (if anyone knows and can >> chime in I would appreciate it). As a default, I think it might be nice to >> align the C++ implementation with the way Java handles them, but I don't >> have any strong opinions. > > I’m probably missing something obvious. But, why not use the raw > 4-byte/8-byte value underneath (treat it as uint32/uint64) for the hashing ? > I’m assuming that will give 1 -> option 2, and 2 -> Option 2.
That would actually give option 1 in both cases, as there's an extremely large number of different NaN representations. But, yes, that's the easiest option. Regards Antoine.