Why are hash functions seeded with 42?

Sean Owen Mon, 26 Sep 2022 16:59:38 -0700

OK, it came to my attention today that hash functions in spark, like
xxhash64, actually always seed with 42:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L655


This is an issue if you want the hash of some value in Spark to match the
hash you compute with xxhash64 somewhere else, and, AFAICT most any other
impl will start with seed=0.

I'm guessing there wasn't a *great* reason for this, just seemed like 42
was a nice default seed. And we can't change it now without maybe subtly
changing program behaviors. And, I am guessing it's messy to let the
function now take a seed argument, esp. in SQL.

So I'm left with, I guess we should doc that? I can do it if so.
And just a cautionary tale I guess, for hash function users.

Why are hash functions seeded with 42?

Reply via email to