Re: Memoize improvement

John Harrop Sat, 31 Oct 2009 06:18:17 -0700

On Fri, Oct 30, 2009 at 7:57 PM, Stefan Arentz <ste...@arentz.ca> wrote:


> This is some of my first Clojure code so it might not be the
> greatest ... yet!
>

It's quite interesting.

(defn my-memoize
>   "Returns a memoized version of a referentially transparent
> function. The
>   memoized version of the function keeps a cache of the mapping from
> arguments
>   to results and, when calls with the same arguments are repeated
> often, has
>   higher performance at the expense of higher memory use. Cached
> results are
>   removed from the cache when their time to live value expires."
>   [function time-to-live]
>   (let [cached-results (atom {})]
>     (fn [& arguments]
>       (swap! cached-results expire-cached-results time-to-live)
>       (println cached-results)
>       (if-let [entry (find @cached-results arguments)]
>         (:result (val entry))
>         (let [result (apply function arguments)]
>           (swap! cached-results assoc arguments { :result
> result :time (System/currentTimeMillis)})
>           result)))))


I'd make a couple of changes, though. First:

(defn my-memoize
  "Returns a memoized version of a referentially transparent
   function. The memoized version of the function keeps a cache of
   the mapping from arguments to results and, when calls with the
   same arguments are repeated often, has higher performance at
   the expense of higher memory use. Cached results are removed
   from the cache when their time to live value expires."
  [function time-to-live]
  (let [cached-results (atom {})]
    (fn [& arguments]
      (swap! cached-results expire-cached-results time-to-live)
      (let [result (if-let [r (get @cached-results arguments)]
                     (:result r)
                     (apply function arguments))]
        (swap! cached-results assoc arguments
          {:result result :time (System/currentTimeMillis)})
        result))))

This resets the expiry clock on a result if the same arguments occur again.
So instead of a result expiring n seconds after being put in, it only does
so if there's an n-second period during which that result isn't used at all.

A more sophisticated change is to make expire-cached-results happen even if
the function stops being called for a long time. That requires threading.
Most efficient might be to dispense with the {:result result :time time} and
just map arguments onto results, but whenever one does an (apply function
arguments) one also assoc's the result into the map in the atom AND starts a
thread:

(.start (Thread. #(Thread/sleep time-to-live) (swap! cached-results dissoc
arguments)))

Poof! After time-to-live that particular result disappears. This reverts to
having the result expire n seconds after being put in, though, and it also
potentially spawns large numbers of threads, albeit mostly all sleeping at
once.

The "result goes away if unused for n seconds" behavior and accumulation of
threads can both be nixed, though:

(defn my-memoize
  "Returns a memoized version of a referentially transparent
   function. The memoized version of the function keeps a cache of
   the mapping from arguments to results and, when calls with the
   same arguments are repeated often, has higher performance at
   the expense of higher memory use. Cached results are removed
   from the cache when their time to live value expires."
  [function time-to-live]
  (let [cached-results (ref {})
        last-time (ref (System/currentTimeMillis))
        expiry-queue (ref [])
        process-queue #(when-not (empty? @expiry-queue)
                         (Thread/sleep
                           (:wait (get @cached-results (first
@expiry-queue))))
                           (dosync
                             (let [args (first (ensure expiry-queue))
                                   item (get (ensure cached-results) args)
                                   t (- (:time item) 100)]
                               (if (<= t (System/currentTimeMillis))
                                 (alter cached-results dissoc args))
                               (ref-set expiry-queue
                                 (vec (rest @expiry-queue)))))
                         (recur))
        ts-agent (agent nil)]
    (fn [& args]
      (let [result (if-let [r (get @cached-results args)]
                     (:result r)
                     (apply function args))]
        (dosync
          (let [t (+ time-to-live (System/currentTimeMillis))
                l (max (ensure last-time) (System/currentTimeMillis))
                w (- t l)
                q (ensure expiry-queue)]
            (println t w q @cached-results)
            (alter cached-results assoc args
              {:result result :time t :wait w})
            (ref-set last-time t)
            (alter expiry-queue conj args)
            (if (empty? q)
              (send ts-agent (fn [_] (.start (Thread. process-queue)))))))
        result))))

This is tested and works:

user=> (def x (my-memoize #(do (println %) (+ % 10)) 1000))
#'user/x
user=> (doall (for [n [1 2 4 1 2 3 1 2 3 2 3 2 3 2 3 1 4]] (do (Thread/sleep
300) (x n))))
1
1256994331877 999 [] {}
2
1256994332178 301 [(1)] {(1) {:result 11, :time 1256994331877, :wait 999}}
4
1256994332478 300 [(1) (2)] {(2) {:result 12, :time 1256994332178, :wait
301}, (1) {:result 11, :time 1256994331877, :wait 999}}
1256994332778 300 [(1) (2) (4)] {(4) {:result 14, :time 1256994332478, :wait
300}, (2) {:result 12, :time 1256994332178, :wait 301}, (1) {:result 11,
:time 1256994331877, :wait 999}}
1256994333078 300 [(2) (4) (1)] {(4) {:result 14, :time 1256994332478, :wait
300}, (2) {:result 12, :time 1256994332178, :wait 301}, (1) {:result 11,
:time 1256994332778, :wait 300}}
3
1256994333378 300 [(4) (1) (2)] {(4) {:result 14, :time 1256994332478, :wait
300}, (2) {:result 12, :time 1256994333078, :wait 300}, (1) {:result 11,
:time 1256994332778, :wait 300}}
1256994333679 301 [(1) (2) (3)] {(3) {:result 13, :time 1256994333378, :wait
300}, (2) {:result 12, :time 1256994333078, :wait 300}, (1) {:result 11,
:time 1256994332778, :wait 300}}
1256994333979 300 [(2) (3) (1)] {(3) {:result 13, :time 1256994333378, :wait
300}, (2) {:result 12, :time 1256994333078, :wait 300}, (1) {:result 11,
:time 1256994333679, :wait 301}}
1256994334279 300 [(3) (1) (2)] {(3) {:result 13, :time 1256994333378, :wait
300}, (2) {:result 12, :time 1256994333979, :wait 300}, (1) {:result 11,
:time 1256994333679, :wait 301}}
1256994334579 300 [(1) (2) (3)] {(3) {:result 13, :time 1256994334279, :wait
300}, (2) {:result 12, :time 1256994333979, :wait 300}, (1) {:result 11,
:time 1256994333679, :wait 301}}
1256994334879 300 [(2) (3) (2)] {(3) {:result 13, :time 1256994334279, :wait
300}, (2) {:result 12, :time 1256994334579, :wait 300}}
1256994335179 300 [(3) (2) (3)] {(3) {:result 13, :time 1256994334879, :wait
300}, (2) {:result 12, :time 1256994334579, :wait 300}}
1256994335479 300 [(2) (3) (2)] {(3) {:result 13, :time 1256994334879, :wait
300}, (2) {:result 12, :time 1256994335179, :wait 300}}
1256994335779 300 [(3) (2) (3)] {(3) {:result 13, :time 1256994335479, :wait
300}, (2) {:result 12, :time 1256994335179, :wait 300}}
1256994336079 300 [(2) (3) (2)] {(3) {:result 13, :time 1256994335479, :wait
300}, (2) {:result 12, :time 1256994335779, :wait 300}}
1
1256994336379 300 [(3) (2) (3)] {(3) {:result 13, :time 1256994336079, :wait
300}, (2) {:result 12, :time 1256994335779, :wait 300}}
4
1256994336679 300 [(2) (3) (1)] {(1) {:result 11, :time 1256994336379, :wait
300}, (3) {:result 13, :time 1256994336079, :wait 300}, (2) {:result 12,
:time 1256994335779, :wait 300}}
(11 12 14 11 12 13 11 12 13 12 13 12 13 12 13 11 14)

You can see how the expiry queue is consumed at one end by the expiry thread
and appended to at the other end. The shortest time until the next item
might expire is stored in the :wait key, and the thread peeks at the queue,
then sleeps that long before popping the queue. If the queue is empty it
quits, but if the queue is empty the enqueueing of a new item (re)starts it.
The thread start is launched via an empty agent so that it is only started
if the transaction in the memoized fn succeeds. When the expiry thread
consumes a queue item it checks if the expiry time has actually expired or
gotten bumped, and only in the former case does it remove from the results
cache. As a result, the println in the memoized fn only executes for 1 and 4
early on, and then after both have gone unused for long enough -- otherwise,
the expiry time keeps getting bumped. (Each bump appends another copy to the
end of the queue.) The (- (:time item) 100) is to allow a little slop in the
timing (you'll notice wait keys getting off-by-one numbers like 999 and 301
in the transcript above).

Actually, the above 35 lines of code (I didn't count the doc-string lines)
supplies a very general expiring-cache mechanism that might be very useful
in general. As usual, I offer it into the public domain. (You'll want to get
rid of the println before using it in production code though.)

Hopefully the above also serves as a nice example of the use of refs and
agents, and also how closures allow one to encapsulate an entire little
ref-world in a self-contained package.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Memoize improvement

Reply via email to