Re: Hashing a procedure object reliably

Maxime Devos via General Guile related discussions Sun, 13 Apr 2025 10:01:52 -0700


On 11/04/2025 15:49, Olivier Dion wrote:

My goal here is that I have GOOPS object.  The object is used to produce
a pure result which I can store in a cache on the disk, given the hash
of the object.  I can then re-fetch the result on disk in another Guile
process if the hashes match.  As you can see in the above code, GOOPS
instances get hashed by folding over their slots, which can include
procedures.


The usual solution to this kind of thing, is to:

(1) hash procedures by pointer (not consistent across processes, notapplicable for your case) (2) don't hash procedures (3) Allow classes tooverride how hashing is performed. Instead of (define (hash value)[...]), you could have (define-method (hash value) [...]). (Also be sureto define equality in a way such that 'a = b -> hash(a)=hash(b)' -doesn't have to be '=' or 'equal?', but it does need to be the equalityprocedure used for hash table things.) (Also consider (hash value n)instead, so implementers have a clue for reasonable output size.) (3)(a)if a field is a thunk merely as a way to do laziness (and forcing thelazy isn't expected to be computationally expensive), force the lazy andhash the result (3)(b) if the procedure field is irrelevant to theconsidered problem, don't hash it (and adjust your expectations to makeit not an element of surprise) (3)(c) if there is a limited set ofprocedures it could be, do something like hashing by name (specificsdepend on specific situation) (3)(d) if the procedure field is relevant,and there is no apparent way to hash something else instead, change thissituation (4) if nothing appears applicable for the problem at hand,change the problem.

If you do go for bytecode hashing, you could consider looking at theclosure and hashing things in there as well, to reduce hash collisions.

I can then re-fetch the result on disk in another Guile process if the hashes 
match.

You can't (at least not with the mentioned hash), because of likely hashcollisions. If the 'hash' function doesn't have collisions, then eitherthe input space is small (finite) and hence the 'hash' isn'tgeneral-purpose (not always an issue, but in your case it seems like itwould be), or the output space is infinite, in which case it has lostits function as a hash.

Typical hashing of the non-cryptographic kind aren't designed tovirtually eliminate hash collisions, rather they are designed to be'good enough', and hash collisions are not expected to be eliminated,instead they are accounted for in some way. In your case, the disk actsas a hash table, so you could make a bucket list (so you would need toalso save the unhashed _keys_ instead of only their hash - hashes thenaren't to identify things on their own, but rather to speed things up alot).


Best regards, Maxime Devos

Re: Hashing a procedure object reliably

Reply via email to