Re: Find all ties in a sorted list

Ken Wesson Wed, 10 Nov 2010 12:59:16 -0800

user=> (defn foo [the-input-seq] (loop [s the-input-seq o [] last nil]
 (if (empty? s)
   o
   (let [n (first s)]
     (if (or (= n (second s)) (= n last))
       (recur (rest s) (conj o n) n)
       (recur (rest s) o n))))))
#'user/foo
user=> (defn bar [coll]
 (let [freqs (frequencies coll)]
   (filter #(> (freqs %) 1) coll)))
#'user/bar
user=> (defn baz [s] (let [x (map #(if (= %1 %2) [%1]) s (rest s))]
 (mapcat #(if (nil? %1) %2 %1) x (concat (rest x) [nil]))))
#'user/baz
user=> (time (doseq [x (range 10000)] (doall (foo '(1 2 3 4 4 5 5 5 5
5 6 9 12 12)))))
"Elapsed time: 128.51932 msecs"
nil
user=> (time (doseq [x (range 10000)] (doall (bar '(1 2 3 4 4 5 5 5 5
5 6 9 12 12)))))
"Elapsed time: 199.70168 msecs"
nil
user=> (time (doseq [x (range 10000)] (doall (baz '(1 2 3 4 4 5 5 5 5
5 6 9 12 12)))))
"Elapsed time: 731.5074 msecs"


As expected: loop/recur fastest, frequencies in betwee, fully-lazy
slowest. The loop/recur version neither creates temporary data
structures (frequencies generates a map, the fully-lazy implementation
an intermediate seq) nor traverses the input twice (both other
versions do -- the frequencies implementation traverses once to build
the frequencies map and again to weed out singletons and the
fully-lazy one weeds the singletons and the first of each run into an
intermediate seq, then goes over the intermediate seq prepending to
each run one more copy, and for the given input the intermediate seq
is nearly as long as the input).

There are other considerations though. The fully-lazy one uses the
least memory; the fully-lazy one's performance is probably better
relative to the other two when the input is sparse in ties (but
probably still slower than the other two). The frequencies one will
work on an input like '(5 1 2 4 5 5 7) whereas one of the 5s will be
missed by the other two. (Besides frequencies, the only implementation
that occurs to me that will work on arbitrary unsorted input is one
that calls sort on its input first and then uses any of the algorithms
already posted. Then it will have n log n performance, so worse than
any of the others on long enough inputs since everything we already
posted is O(n), and will choke on inputs whose elements aren't all
pairwise Comparable. Sorting by #(.compare (.hashCode %1) (.hashCode
%2)) fixes the latter but not the O(n log n) performance and
introduces a tiny chance of error (if 5 and 4 in the above input had
the same hashcode the sort could produce '(1 2 5 4 5 5 7) for instance
and a 5 still gets lost). Sorting with #(try (.compare %1 %2) (catch
ClassCastException e (.compare (.toString (.getClass %1)) (.toString
(.getClass %2))))) should work, but is still n log n. In fact that
last comparator might be useful as a general-purpose comparator for
heterogeneous colls. It won't consider e.g. 5 (Integer) and 5.0
(Double) as a tie here though, which may or may not be desirable
behavior.)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Find all ties in a sorted list

Reply via email to