Spacing appears to have been messed up, so here's the same thing with extra newlines

On 13/04/2025 18:55, Maxime Devos via General Guile related discussions wrote:

On 11/04/2025 15:49, Olivier Dion wrote:
My goal here is that I have GOOPS object. The object is used to produce
a pure result which I can store in a cache on the disk, given the hash
of the object.  I can then re-fetch the result on disk in another Guile
process if the hashes match.  As you can see in the above code, GOOPS
instances get hashed by folding over their slots, which can include
procedures.

The usual solution to this kind of thing, is to:

(1) hash procedures by pointer (not consistent across processes, not applicable for your case)



(2) don't hash procedures



(3) Allow classes to override how hashing is performed. Instead of (define (hash value) [...]), you could have (define-method (hash value) [...]). (Also be sure to define equality in a way such that 'a = b -> hash(a)=hash(b)' - doesn't have to be '=' or 'equal?', but it does need to be the equality procedure used for hash table things.) (Also consider (hash value n) instead, so implementers have a clue for reasonable output size.)



(3)(a) if a field is a thunk merely as a way to do laziness (and forcing the lazy isn't expected to be computationally expensive), force the lazy and hash the result



(3)(b) if the procedure field is irrelevant to the considered problem, don't hash it (and adjust your expectations to make it not an element of surprise)



(3)(c) if there is a limited set of procedures it could be, do something like hashing by name (specifics depend on specific situation)



(3)(d) if the procedure field is relevant, and there is no apparent way to hash something else instead, change this situation



(4) if nothing appears applicable for the problem at hand, change the problem.

If you do go for bytecode hashing, you could consider looking at the closure and hashing things in there as well, to reduce hash collisions.

I can then re-fetch the result on disk in another Guile process if the hashes match.

You can't (at least not with the mentioned hash), because of likely hash collisions. If the 'hash' function doesn't have collisions, then either the input space is small (finite) and hence the 'hash' isn't general-purpose (not always an issue, but in your case it seems like it would be), or the output space is infinite, in which case it has lost its function as a hash.

Typical hashing of the non-cryptographic kind aren't designed to virtually eliminate hash collisions, rather they are designed to be 'good enough', and hash collisions are not expected to be eliminated, instead they are accounted for in some way. In your case, the disk acts as a hash table, so you could make a bucket list (so you would need to also save the unhashed _keys_ instead of only their hash - hashes then aren't to identify things on their own, but rather to speed things up a lot).

Best regards, Maxime Devos

Reply via email to