Spacing appears to have been messed up, so here's the same thing with
extra newlines
On 13/04/2025 18:55, Maxime Devos via General Guile related discussions
wrote:
On 11/04/2025 15:49, Olivier Dion wrote:
My goal here is that I have GOOPS object. The object is used to produce
a pure result which I can store in a cache on the disk, given the hash
of the object. I can then re-fetch the result on disk in another Guile
process if the hashes match. As you can see in the above code, GOOPS
instances get hashed by folding over their slots, which can include
procedures.
The usual solution to this kind of thing, is to:
(1) hash procedures by pointer (not consistent across processes, not
applicable for your case)
(2) don't hash procedures
(3) Allow classes to override how hashing is performed. Instead of
(define (hash value) [...]), you could have (define-method (hash
value) [...]). (Also be sure to define equality in a way such that 'a
= b -> hash(a)=hash(b)' - doesn't have to be '=' or 'equal?', but it
does need to be the equality procedure used for hash table things.)
(Also consider (hash value n) instead, so implementers have a clue for
reasonable output size.)
(3)(a) if a field is a thunk merely as a way to do laziness (and
forcing the lazy isn't expected to be computationally expensive),
force the lazy and hash the result
(3)(b) if the procedure field is irrelevant to the considered problem,
don't hash it (and adjust your expectations to make it not an element
of surprise)
(3)(c) if there is a limited set of procedures it could be, do
something like hashing by name (specifics depend on specific situation)
(3)(d) if the procedure field is relevant, and there is no apparent
way to hash something else instead, change this situation
(4) if nothing appears applicable for the problem at hand, change the
problem.
If you do go for bytecode hashing, you could consider looking at the
closure and hashing things in there as well, to reduce hash collisions.
I can then re-fetch the result on disk in another Guile process if
the hashes match.
You can't (at least not with the mentioned hash), because of likely
hash collisions. If the 'hash' function doesn't have collisions, then
either the input space is small (finite) and hence the 'hash' isn't
general-purpose (not always an issue, but in your case it seems like
it would be), or the output space is infinite, in which case it has
lost its function as a hash.
Typical hashing of the non-cryptographic kind aren't designed to
virtually eliminate hash collisions, rather they are designed to be
'good enough', and hash collisions are not expected to be eliminated,
instead they are accounted for in some way. In your case, the disk
acts as a hash table, so you could make a bucket list (so you would
need to also save the unhashed _keys_ instead of only their hash -
hashes then aren't to identify things on their own, but rather to
speed things up a lot).
Best regards, Maxime Devos