user=> (defn foo [the-input-seq] (loop [s the-input-seq o [] last nil] (if (empty? s) o (let [n (first s)] (if (or (= n (second s)) (= n last)) (recur (rest s) (conj o n) n) (recur (rest s) o n)))))) #'user/foo user=> (defn bar [coll] (let [freqs (frequencies coll)] (filter #(> (freqs %) 1) coll))) #'user/bar user=> (defn baz [s] (let [x (map #(if (= %1 %2) [%1]) s (rest s))] (mapcat #(if (nil? %1) %2 %1) x (concat (rest x) [nil])))) #'user/baz user=> (time (doseq [x (range 10000)] (doall (foo '(1 2 3 4 4 5 5 5 5 5 6 9 12 12))))) "Elapsed time: 128.51932 msecs" nil user=> (time (doseq [x (range 10000)] (doall (bar '(1 2 3 4 4 5 5 5 5 5 6 9 12 12))))) "Elapsed time: 199.70168 msecs" nil user=> (time (doseq [x (range 10000)] (doall (baz '(1 2 3 4 4 5 5 5 5 5 6 9 12 12))))) "Elapsed time: 731.5074 msecs"
As expected: loop/recur fastest, frequencies in betwee, fully-lazy slowest. The loop/recur version neither creates temporary data structures (frequencies generates a map, the fully-lazy implementation an intermediate seq) nor traverses the input twice (both other versions do -- the frequencies implementation traverses once to build the frequencies map and again to weed out singletons and the fully-lazy one weeds the singletons and the first of each run into an intermediate seq, then goes over the intermediate seq prepending to each run one more copy, and for the given input the intermediate seq is nearly as long as the input). There are other considerations though. The fully-lazy one uses the least memory; the fully-lazy one's performance is probably better relative to the other two when the input is sparse in ties (but probably still slower than the other two). The frequencies one will work on an input like '(5 1 2 4 5 5 7) whereas one of the 5s will be missed by the other two. (Besides frequencies, the only implementation that occurs to me that will work on arbitrary unsorted input is one that calls sort on its input first and then uses any of the algorithms already posted. Then it will have n log n performance, so worse than any of the others on long enough inputs since everything we already posted is O(n), and will choke on inputs whose elements aren't all pairwise Comparable. Sorting by #(.compare (.hashCode %1) (.hashCode %2)) fixes the latter but not the O(n log n) performance and introduces a tiny chance of error (if 5 and 4 in the above input had the same hashcode the sort could produce '(1 2 5 4 5 5 7) for instance and a 5 still gets lost). Sorting with #(try (.compare %1 %2) (catch ClassCastException e (.compare (.toString (.getClass %1)) (.toString (.getClass %2))))) should work, but is still n log n. In fact that last comparator might be useful as a general-purpose comparator for heterogeneous colls. It won't consider e.g. 5 (Integer) and 5.0 (Double) as a tie here though, which may or may not be desirable behavior.) -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en