> Also, I don't understand why there are two versions of the hash table
> ("hashing32" and "hashing64" apparently). What's the rationale? How is
> the user meant to choose between them? Say a Substrait plan is being
> executed: which hashing variant is chosen and why?
It's not user-configurable.
Hi,
Le 21/07/2023 à 15:58, Yaron Gvili a écrit :
A first approach I found is using `Hashing32` and `Hashing64`. This approach
seems to be useful for hashing the fields composing a key of multiple rows when
joining. However, it has a couple of drawbacks. One drawback is that if the
number of
Yes, those are the two main approaches to hashing in the code base that I
am aware of as well. I haven't seen any real concrete comparison and
benchmarks between the two. If collisions between NA and 0 are a problem
it would probably be ok to tweak the hash value of NA to something unique.
I susp
Hi,
What are the recommended ways to hash Arrow structures? What are the pros and
cons of each approach?
Looking a bit through the code, I've so far found two different hashing
approaches, which I describe below. Are there any others?
A first approach I found is using `Hashing32` and `Hashing6