You may already know that `datum-intern-literal` is implemented similarly, but `datum-intern-literal` hashes only certain kinds of values. The `canonicalize` function below accepts arbitrary values, so the `equal-hash-code` and `equal?` operations by the hash table can call arbitrary code. Running arbitrary code in atomic mode is generally not a good idea, and it's not allowed in the run-time system.
Also, if the `canonical-values` table were truly global within the run-time system, then it would add a problematic communication channel (via mutable values). Assuming well-behaved arguments, I think that `canonicalize` needs to remove the value from `b` before deciding whether to return that value, otherwise the value might get GCed between the time that the weak box is found it in the table and the time that it's content is returned. For that second problem, the `datum-intern-literal` implementation can "cheat", since it gets an extra level of atomicity with respect to GCs (by restricting to certain values and being in the run-time system). The `datum-intern-literal` function also avoids a weak-box allocation through an internal operation to extract the key (instead of value) from a hash-table match. I see a speed difference between `datum-intern-literal` and `canonicalize` when adding new strings, and I think the difference is due to allocating the extra box. Looking up a previously registered string with `canonicalize` is just as fast, though. I don't think `canonicalize` would get much faster by being built-in. At Mon, 30 Jun 2014 20:54:38 -0400, Tony Garnock-Jones wrote: > Hi all, > > I'd love any comments on the following as a general technique for > "hash-consing" - that is, making (equal?) imply (eq?). (Corollary: two > canonicalized values will be (eq?) iff they are (equal?).) > > Is it worth considering this as something to push into the C runtime for > speed? > > #lang racket/base > (provide canonicalize) > (require ffi/unsafe/atomic) > > (define canonical-values (make-weak-hash)) > > (define (canonicalize val) > (start-atomic) > (define b (hash-ref canonical-values > val > (lambda () > (hash-set! canonical-values > val > (make-weak-box val)) > #f))) > (end-atomic) > (if b (weak-box-value b) val)) > > See also > https://gist.github.com/tonyg/86d0052c9c28431e1546#file-hashcons-rkt-L9-L17, > where I have test cases demonstrating simple uses of canonicalize and > its interactions with the garbage collector. > > Regards, > Tony > ____________________ > Racket Users list: > http://lists.racket-lang.org/users ____________________ Racket Users list: http://lists.racket-lang.org/users