Thanks for running the second experiment.

The more I think about it, getting "typical" performance numbers for
something like this is very difficult, just like getting "typical"
numbers for sorting algorithms is difficult.  Beware the evils of
micro-benchmarking.

Does anyone have some ideas on how to perform a more rigorous
experiment?

Sean

On Feb 18, 12:44 pm, Rowdy Rednose <rowdy.redn...@gmx.net> wrote:
> Yeah, the numbers depend a bit on the data, so distinct will look
> better if there are many dupes, and running tests against
>
> (mapcat vector (range 500000) (range 500000))
>
> so we have 50% dupes, gave me these numbers:
>
> user=> (time (count (unique (mapcat vector (range 500000) (range
> 500000)))))
> "Elapsed time: 4319.796441 msecs"
> 500000
> user=> (time (count (better-unique (mapcat vector (range 500000)
> (range 500000)))))
> "Elapsed time: 4107.580937 msecs"
> 500000
> user=> (time (count (distinct (mapcat vector (range 500000) (range
> 500000)))))
> "Elapsed time: 7279.815677 msecs"
> 500000
> user=> (time (count (parti-by-unique (mapcat vector (range 500000)
> (range 500000)))))
> "Elapsed time: 5889.013135 msecs"
> 500000
>
> So now my original unique is actually a little bit slower than better-
> unique in all the runs I did. I guess this is due to the overhead of
> atoms?
>
> Still distinct will never be faster than better-unique, as it has to
> do more.
>
> On Feb 19, 1:26 am, Sean Devlin <francoisdev...@gmail.com> wrote:
>
> > Rowdy,
> > Does your profiling test do any work?  There aren't any duplicates in
> > (range 1000000)  Perhaps you should run a few more benchmarks to get a
> > better idea of what is going on.
>
> > Sean
>
> > On Feb 18, 10:34 am, Rowdy Rednose <rowdy.redn...@gmx.net> wrote:
>
> > > Various interesting approaches to this problem... thanks guys!
>
> > > Stu, you're right, distinct builds up a set of all items encountered,
> > > which I don't need, and it's costly. But that function it is indeed a
> > > good model. So here's a side effects free version modeled after
> > > distinct:
>
> > > (defn better-unique
> > >   [coll]
> > >     (let [step (fn step [xs last]
> > >                    (lazy-seq
> > >                     ((fn [[f :as xs] last]
> > >                       (when-let [s (seq xs)]
> > >                         (if (= last f)
> > >                           (recur (rest s) f)
> > >                           (cons f (step (rest s) f)))))
> > >                      xs last)))]
> > >       (step coll (Object.))))
>
> > > Doing some micro-benchmarking, it turns out my initial version is the
> > > fastest, but better-unique is comparable.
>
> > > This one
>
> > > (defn parti-by-unique [sc] (map first (partition-by identity sc)))
>
> > > is much slower, and distinct is the slowest of the ones I tried, as it
> > > does much more than I need. Here are the numbers (and I don't wanna
> > > hear anybody laugh about the absolute values I achieve on my box ;)
>
> > > user=> (time (count (unique (range 1000000))))
> > > "Elapsed time: 1070.535608 msecs"
> > > 1000000
> > > user=> (time (count (better-unique (range 1000000))))
> > > "Elapsed time: 1510.92021 msecs"
> > > 1000000
> > > user=> (time (count (parti-by-unique (range 1000000))))
> > > "Elapsed time: 3344.861758 msecs"
> > > 1000000
> > > user=> (time (count (distinct (range 1000000))))
> > > "Elapsed time: 6724.705348 msecs"
> > > 1000000
>
> > > And yes, for a generic function like this, that I want to use
> > > everywhere and anywhere, I do consider performance.
>
> > > better-unique is not too far off from my initial approach, but still,
> > > why no side effects?
> > > Even though I do not know when the side-effects happen (lazily,
> > > eagerly), shouldn't the order be fixed in case of filter?
>
> > > At least I'm assuming that every reset! of the last atom will happen
> > > at a time so that the result of unique will not be affected.
>
> > > On Feb 18, 11:34 pm, Stuart Halloway <stuart.hallo...@gmail.com>
> > > wrote:
>
> > > > Rowdy's question asks for less than what core/distinct delivers--he
> > > > only wanted to remove *adjacent* duplicates.
>
> > > > That said, core/distinct is a better demo of "how do I maintain state
> > > > across a lazy sequence without requiring any mutation?"
>
> > > > Stu
>
> > > > > Hi,
>
> > > > > On Feb 18, 3:04 pm, Rowdy Rednose <rowdy.redn...@gmx.net> wrote:
>
> > > > >> "Returns a lazy sequence of the items in coll for which (pred item)
> > > > >> returns true. pred must be free of side-effects."
>
> > > > >> So that means I should not write a function like this:
>
> > > > >> (defn unique [sc]
> > > > >>   "Returns a lazy sequence with all consecutive duplicates removed"
> > > > >>   (let [last (atom (Object.))]
> > > > >>     (filter #(let [ok (not= @last %)] (reset! last %) ok) sc)))
>
> > > > >> user=> (unique [1 2 3 3 4 5 5 6])
> > > > >> (1 2 3 4 5 6)
>
> > > > >> But in contrast to functions that can be retried (compare-and-swap
> > > > >> etc.), I don't immediately see why having side effects in filter
> > > > >> would
> > > > >> be bad. Can anybody enlighten me? And how should I do this instead?
>
> > > > > Besides the other suggestions: clojure.core/distinct and its
> > > > > implementation.
>
> > > > > Sincerely
> > > > > Meikel
>
> > > > > --
> > > > > You received this message because you are subscribed to the Google
> > > > > Groups "Clojure" group.
> > > > > To post to this group, send email to clojure@googlegroups.com
> > > > > Note that posts from new members are moderated - please be patient
> > > > > with your first post.
> > > > > To unsubscribe from this group, send email to
> > > > > clojure+unsubscr...@googlegroups.com
> > > > > For more options, visit this group at
> > > > >http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to