Hi, dedupe is almost what you need, but you can just copy the source and modify it slightly:
(defn dedupe-by "Similar to dedupe but allows applying a function to the element by which to dedupe." ([f] (fn [rf] (let [pv (volatile! ::none)] (fn ([] (rf)) ([result] (rf result)) ([result input] (let [prior @pv cv (f input)] (vreset! pv cv) (if (= prior cv) result (rf result input)))))))) ([f coll] (sequence (dedupe-by f) coll))) You can then just say `(dedupe-by :value xs)`. HTH On Tuesday, May 17, 2016 at 11:47:06 AM UTC+2, Simon Brooke wrote: > > I'm having trouble with writing a function > > 1. in idiomatic clojure > 2. which doesn't blow the stack > > The problem is I have a time series of events e.g. > > ({:idhistory 78758272, :timestamp #inst > "2016-03-31T19:34:27.313000000-00:00", :nameid 5637, :stringvalue nil, > :value 8000.0} > {:idhistory 78756591, :timestamp #inst > "2016-03-31T19:33:31.697000000-00:00", :nameid 5637, :stringvalue nil, > :value 7368.0} > {:idhistory 78754249, :timestamp #inst > "2016-03-31T19:32:17.100000000-00:00", :nameid 5637, :stringvalue nil, > :value 6316.0} > {:idhistory 78753165, :timestamp #inst > "2016-03-31T19:31:41.843000000-00:00", :nameid 5637, :stringvalue nil, > :value 5263.0} > {:idhistory 78751187, :timestamp #inst > "2016-03-31T19:30:36.213000000-00:00", :nameid 5637, :stringvalue nil, > :value 4211.0} > {:idhistory 78749476, :timestamp #inst > "2016-03-31T19:29:41.363000000-00:00", :nameid 5637, :stringvalue nil, > :value 3158.0} ...) > > which is to say, each event is a map, and each event has two critical > keys, :timestamp and :value. The series is sorted in descending order by > timestamp, i.e. most recent event first. These series are of up to millions > of events; the average length of the series is about half a million events. > However, many contain successive events at which the value does not change, > and where the value doesn't change I want to retain only the first event. > > So far what I've got is: > > (defn consolidate-events > "Return a time series like this `series`, but without those events whose > value is > identical to the value of the preceding event." > [series] > (let [[car cadr & cddr] series] > (cond > (empty? series) series > (= > (get-value-for-event car) > (get-value-for-event cadr)) (consolidate-events (rest series)) > true (cons car (consolidate-events (rest series)))))) > > > Obviously, with millions of events or even merely hundreds of thousands, a > recursive function blows the stack. Furthermore, this one isn't even tail > call optimisable. I tried creating an inner function which I naively > thought should be tail call optimisable, but it fails 'Can only recur from > tail position': > > (defn consolidate-events > "Return a time series like this `series`, but without those events whose > value is > identical to the value of the preceding event." > [series] > (remove > nil? > (let [inner (fn [series] > (let [[car cadr & cddr] series] > (if > (not (empty? series)) > ;; then > (cons > (if > (= (get-value-for-event car) > (get-value-for-event cadr)) > ;; then > nil > ;; else > car) > (if > (not (empty? series)) > (recur (rest series)))))))] > (inner series)))) > > > Test for the function is as follows: > > (deftest consolidate-events-test > (testing "consolidate-events" > (let [s1 [{:timestamp #inst "2016-03-31T19:34:27.313000000-00:00", > :value 8000.0} > {:timestamp #inst "2016-03-31T19:33:31.697000000-00:00", > :value 7368.0} > {:timestamp #inst "2016-03-31T19:32:17.100000000-00:00", > :value 6316.0} > {:timestamp #inst "2016-03-31T19:31:41.843000000-00:00", > :value 5263.0} > {:timestamp #inst "2016-03-31T19:30:36.213000000-00:00", > :value 4211.0} > {:timestamp #inst "2016-03-31T19:29:41.363000000-00:00", > :value 3158.0}] > s2 [{:timestamp #inst "2016-03-31T19:34:27.313000000-00:00", > :value 8000.0} > {:timestamp #inst "2016-03-31T19:33:31.697000000-00:00", > :value 7368.0} > {:timestamp #inst "2016-03-31T19:33:17.100000000-00:00", > :value 6316.0} > {:timestamp #inst "2016-03-31T19:32:27.100000000-00:00", > :value 6316.0} > {:timestamp #inst "2016-03-31T19:32:17.100000000-00:00", > :value 6316.0} > {:timestamp #inst "2016-03-31T19:31:41.843000000-00:00", > :value 5263.0} > {:timestamp #inst "2016-03-31T19:30:36.213000000-00:00", > :value 4211.0} > {:timestamp #inst "2016-03-31T19:29:41.363000000-00:00", > :value 3158.0}]] > (is (= s1 (consolidate-events s1)) "There are no events in s1 that > can be consolidated") > (is (= s1 (consolidate-events s2)) "When consolidated, s2 = s1") > (is (not (= s2 (consolidate-events s2))) "When consolidated, s2 no > longer equals s2")))) > > > Any help gratefully accepted! > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.