I may need a ref world that could get bigger than main memory can easily cope with. I was wondering if something like this would work well:
(defvar- node-cache (ConcurrentHashMap. 128 0.75 128)) (defn- dead-entries-keys [] (doall (filter identity (for [entry node-cache] (let [val (.getValue entry)] (if (instance? WeakReference val) (if-not (.get val) (.getKey entry)))))))) (defn- prune-dead-entries [] (doseq [k (dead-entries-keys)] (.remove node-cache k))) (defn- undertaker (loop [] (Thread/sleep 60000) (prune-dead-entries) (recur))) (.start (Thread. undertaker)) (defn get-node [node-id base-dir] (let [dummy (Object.)] (loop [] (.putIfAbsent node-cache node-id dummy) (let [val (.get node-cache node-id)] (if (= val dummy) ; Node not already cached. We own the entry. (let [noderef (ref @(future load-node node-id base-dir))] (.put node-cache node-id (WeakReference. noderef)) noderef) (if-not (instance? val WeakReference) ; Node not already cached. Someone else owns the entry. Wait for the ; node to be loaded. (do (Thread/sleep 10) (recur)) (let [noderef (.get val)] (if noderef ; Already cached. noderef (do ; Dead entry. Prune it. (.replace node-cache node-id val dummy) (recur)))))))))) (defvar- to-be-saved (ref #{})) (defn put-node [noderef base-dir] (dosync (commute to-be-saved #(conj % [noderef base-dir])))) (defn- saver (loop [] (let [entry (dosync (let [entry (first @to-be-saved)] (if entry (commute to-be-saved disj entry)) entry))] (if entry (save-node @noderef base-dir) (Thread/sleep 100))) (recur))) Here, the objects subjected to persistence are called "nodes". It's assumed there's a load-node that can find a node on disk given a "node-id" for it, which should be a string or other bit of data, and there's a save-node that will save a node likewise; nodes would have to carry around their ids. The node could e.g. be an [id object] vector. There'd need to be a way of making globally unique ids and other details to work out too. The basic concept seems sound. The get-node function is the complicated bit and tries to atomically retrieve a node from the ConcurrentHashMap if already cached there and load it from disk if not. The @(future load-node ...) bit is so that get-node can be called in a transaction without plotzing. The I/O shouldn't get done twice even if the transaction retries because of how get-node is designed. The dummy Object marks a cache entry that's under construction with a market unique to the get-node invocation, so it can tell if a different get-node invocation beat it to the punch by a smidgen. It sleeps .01 second and retries until the other invocation has loaded the node. Of course to avoid all the nodes accumulating into memory something has to kill off ones not in use, and that's java.lang.ref.WeakReference. An undertaker thread sweeps once a minute through the cache pruning dead entries. Of course get-node may run into one in which case it tries to prune it and then reruns itself; the three-parameter replace method is used so that if another get-node invocation has concurrently already put in a new entry it doesn't clobber it. There's also a put-node to signal that a node has changes in need of persisting. The put-node function schedules nodes for saving using a private set held in a ref and a transaction. The saver thread checks every ten seconds for the set to be non-empty and whenever it is it empties it, calling save-node out-transaction. Both load-node and save-node are called out-transaction. The set operations commute; that and ConcurrentHashMap avoids contention for access to the node cache or the save queue. (Actually perhaps a LinkedBlockingQueue would be better for the latter?) If there are any problems likely with this approach I'm all ears. (I know a straightforward (spit (filename-from-id-and-basedir node)) for save-node has durability problems if e.g. the power goes out in mid-write. That's an issue for save-node implementation though, not for the above. Some scheme involving temporary files and .renameTo ought to do it anyway. It's also incomplete in that it has no function for creating a new node ex nihilo.) If this actually works, it adds lightweight persistance to the ref world almost transparently to users, who just need to package their ref objects in nodes and extract them again when needed. This likely just means either adding an :id field to maps, adding an id at the start of vectors, or adding one more key to their assoc-ins and update-ins. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en