On Tue, 26 May 2026, Théophile BRÉZOT <[email protected]> wrote:
> I am quite happy to read a discussion about hashing in Guile as this
> hash been one of the few real pain points in my experience: the hash is
> not behaving as one would expect from a generic hash function. What
> follows is not directly related to the question asked, so I hope this
> message is not misplaced.
>
>>  Two different lists, same hash values.  Weird.  But it's true:
>>
>>  scheme@(guile-user)> (= (hash '(1 2 3 (4)) most-positive-fixnum)
>>                          (hash '(1 2 3 (4 5)) most-positive-fixnum))
>>    => #t
>>
>> and it gets worse than that:
>>
>>  scheme@(guile-user)> (= (hash '(1 2 3 4 5) most-positive-fixnum)
>>                          (hash '(1 2 3 4 5 6) most-positive-fixnum))
>>    => #t
>
> As noted before, there is the issue with the recursion being limited to
> a rather low level, but hashing is also invariant across permutation of
> its inputs:
>
> ```
> scheme@(guile-user)> (= (hash '(1 2) most-positive-fixnum)
>                         (hash '(2 1) most-positive-fixnum))
> $7 = #t
> ```
>
> And for some data types, it is simply broken:
>
> ```
> scheme@(guile-user)> (= (hash (make-u8vector 3 0) most-positive-fixnum)
>                         (hash (make-u8vector 3 1) most-positive-fixnum))
> $6 = #t
> ```
>
> The same holds for `f32vector` and `f64vector` which are implemented on
> top of `u8vector`. Those two specific cases caused very-hard-to-debug
> issues in two of my projects...
>
>> One of the design goals for ‘hash’ is that it must be constant-time.
>> Thus, by definition, it cannot traverse entire lists or vectors.
>
> I don't know what is the story behind this function. I don't know why it
> is required to be constant-time. What I do know is that nowadays,
> non-cryptographic hash functions are so fast that hashing is rarely an
> issue [1], and that many languages have converged to using SipHash for
> hash-tables since it is fast and protects against hash flooding (if
> hash-table based on a poorly designed hash is exposed through a public
> API, an attacker may be able to forge malicious inputs that all hash to
> the same target bucket).
>
>> I think what we probably want is a strong collision resistant hash.  A
>> strong crypto hash would do the job for this. So the current core hash
>> would not be impacted.
>
> Cryptographic hash functions have properties that are not required for
> generic hash functions (like pre-image resistance) which makes them much
> slower. SipHash is formally a MAC since it requires a key to be secure,
> but it is really fast (see [1]).

In my experience, if written in C with hardware acceleration, a crypto
hash is very fast.  But I guess Siphash could be even faster.  However,
I don't think we could implement it in pure Scheme for performance
reasons but also because fixnum might not be able to encode the hash and
so this would generate a lot of allocation with bignum.

> I do not think going for a cryptographic hash function as default hash
> is a good idea. I would use SipHash for hash-tables and maybe another
> one like CityHash or XXH for internal use like generating identifiers.

Interesting.  I will have a look at those.

>> I had some discussions related to this, i.e. crypto hash, in core guile,
>> with Ludovic and Rob.  I think that having core support for some
>> the crypto hashing makes sense, especially if the compiler needs it.
>
> That said, I think having curated cryptographic primitives in the core
> language like Go does is a great step toward a more secure ecosystem.
> Even if the compiler does not need it in the end.

Well we can already offload that to guile-gcrypt.  So I wonder, what
would be the advantage of having the crypto hashing in core Guile vs
relying on an external dependency.

Thanks,
Olivier

-- 
Olivier Dion

Reply via email to