On Fri, 11 Apr 2025, Vivien Kraus <viv...@planete-kraus.eu> wrote:
> Hello,
>
> Le vendredi 11 avril 2025 à 09:49 -0400, Olivier Dion a écrit :
>> But some fields seems to
>> not be taken into account and you will end up with fetching
>> substitute
>> in the store.  I want to avoid this element of surprise in my case.

Side note, my problem is outside of Guix entirely.  I just gave this as
an example of caching behavior I want to avoid in my project.

> From what I understand, here (problem A), you want to avoid a hash
> collision (two different codes hashing to the same value)…

Right.  We don't want `(lambda (x y) (+ x y))' to have the same hash as
`(lambda (x y) (* x y))'.  Otherwise, users would get a big suprise to
see that it get the same cached result.

>> Like mentioned in the comment (look for the `(procedure? value)'
>> clause), this does _not_ yield the same hash for a procedure if the
>> module where this procedure came from is compiled.
>
> And here (problem B), you want to avoid having equivalent code (source
> vs bytecode) hashing to different values.
>
> Did I summarize it correctly?

I think the definition of equivalent here is that the pure behavior of
the procedures are the same.  Say we're hashing `(lambda (x y) (+ x y))'
and `(lambda (u v) (+ u v))'.  These procedures does not have equivalent
sources, but I do have equivalent semantics.  This is why the bytecode
is hashed instead, because it represents the semantic of the procedure
at the machine level.  Alas, If you rehash the same procedure after
compilation (often auto-compilation), you get a different hash of
course.

> Solving A is easy: generate a new unique identifier each time you hash
> any functional value (or refuse to cache the result).

This assumes the same hashing ordering from process to process.  Hash
must be identical for every invokation of Guile.  But really A is not a
problem here I think.

> I don’t have an answer for B, but others may.

I do have an idea.  I'm willing to have `define#' and `lambda#'
syntax-transformers.  Users would use it to define procedures that are
important to hash.  The idea would be to evaluate the source of the
procedure at expansion time and hash this procedure bytecode.  The
resulting hash is thus computed at expansion time and stored as a
procedure property.

Here's the implementation:

--8<---------------cut here---------------start------------->8---
  (use-modules
   (ice-9 match)
   (rnrs bytevectors)
   ((srfi srfi-69) #:prefix srfi:)
   (system foreign)
   (system vm program)
   )

  (eval-when (expand load compile)
    (define (procedure-bytecode proc)
      (match (program-address-range proc)
        ((start . stop)
         (pointer->bytevector (make-pointer start)
                              (- stop start)))))
    (define (hash-bytevector bv)
      (let loop ((acc (srfi:hash #vu8()))
                 (k 0))
        (if (= k (bytevector-length bv))
            acc
            (loop (+ acc (* (1+ k) (srfi:hash (bytevector-u8-ref bv k))))
                  (1+ k)))))
    (define (hash-expression formals body body*)
      (false-if-exception
       (hash-bytevector
        (procedure-bytecode
         (eval `(lambda (,@formals) ,body ,@body*)
               (current-module)))))))

  (define-syntax define#
    (lambda (stx)
      (syntax-case stx ()
        ((_ (name formals ...) body body* ...)
         (identifier? #'name)
         (let ((the-hash (hash-expression (syntax->datum #'(formals ...))
                                          (syntax->datum #'body)
                                          (syntax->datum #'(body* ...)))))
           (with-syntax ((set-hash! (if the-hash
                                        #`(set-procedure-property! name 'hash 
#,the-hash)
                                        #'(begin))))
             #'(begin
                 (define* (name formals ...)
                   body body* ...)
                 set-hash!)))))))

  (define-syntax lambda#
    (lambda (stx)
      (syntax-case stx ()
        ((_ (formals ...) body body* ...)
         (let ((the-hash (hash-expression (syntax->datum #'(formals ...))
                                          (syntax->datum #'body)
                                          (syntax->datum #'(body* ...)))))
           #`(let ((proc (lambda (formals ...) body body* ...)))
               #,(if the-hash
                     #`(set-procedure-property! proc 'hash #,the-hash)
                     #'(begin))
               proc))))))

  (define (procedure-hash proc)
    (or (procedure-property proc 'hash) 0))

  (define# (foo x y)
    (+ x y))

  (define# (bar x y)
    (+ x y))

  (pk
   (procedure-hash foo)
   (procedure-hash bar)
   (procedure-hash (lambda# (u v) (+ u v)))
   )
--8<---------------cut here---------------end--------------->8---

> However, considering problem B, it may be more complex than just
> source and bytecode. What if the GOOPS object creator uses different
> dependencies, but they turn out to have the same behavior? I’m
> thinking about a non-free/free replacement situation. Since the object
> creator has a clearer vision of the functions and dependencies they
> use, maybe you could ask them to provide a version number of their
> relevant functional code, and hash this version number instead?

I'm not sure to understand your concern here about non-free
replacements.  Could you expand on this by giving an example?

Thanks,
Olivier
-- 
Olivier Dion
oldiob.ca

Reply via email to