"Meikel Brandmeyer (kotarak)" <m...@kotka.de> writes:

Hi Meikel,

> I propose to use criterium to do
> benchmarking. https://clojars.org/criterium

Thanks for the hint.  The short answer is that the creation of a
function composition using `comp*` (see patch attached to CLJ-1010) is
much, much faster than creating it with `comp`, which is expected cause
there's no `reverse`.  However, applying the composition created with
`comp` is ~2.5 times faster than applying the composition created with
`comp*`.  To make it even stranger, with my patch the arbitrary arity
version of `comp` just calls

    (apply comp* (reverse (list* f1 f2 f3 fs)))

so there shouldn't be a difference.  Well ok, `reverse` will eliminate
lazyness, but in my code from the last post, I'm constructing the list
of functions to compose by cons-ing to a list recursively, so there
shouldn't be lazyness involved.  I also tried conjoining to a list
(because it seems iterating a clojure.lang.Cons is slower than iterating
a clojure.lang.PersistentList), but that doesn't seem to change
anything.

Below are all the hairy details...

Bye,
Tassilo

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

I've done benchmarking with criterium now, and the following suggests
that the left-to-right function composition `comp*` is created much
faster than the right-to-left version `comp` which needs to reverse the
seq of fns first.

--8<---------------cut here---------------start------------->8---
r-reduce> (do
           (bench (apply comp* (take 200000 (repeat identity))) :verbose)
           (println "----------------------------------------")
           (bench (apply comp (take 200000 (repeat identity))) :verbose))
amd64 Linux 3.4.0-gentoo 2 cpu(s)
OpenJDK 64-Bit Server VM 22.0-b10
Runtime arguments: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n 
-XX:+TieredCompilation -Xmx1G 
-Dclojure.compile.path=/home/horn/Repos/clj/testi/target/classes 
-Dtesti.version=0.1.0-SNAPSHOT -Dclojure.debug=false
Evaluation count             : 7609380
             Execution time mean : 7.894850 us  95.0% CI: (7.892085 us, 
7.897481 us)
    Execution time std-deviation : 278.514554 ns  95.0% CI: (275.961233 ns, 
282.746932 ns)
         Execution time lower ci : 7.650744 us  95.0% CI: (7.649851 us, 
7.650744 us)
         Execution time upper ci : 8.423008 us  95.0% CI: (8.423008 us, 
8.432979 us)

Found 5 outliers in 60 samples (8.3333 %)
        low-severe       2 (3.3333 %)
        low-mild         3 (5.0000 %)
 Variance from outliers : 22.1941 % Variance is moderately inflated by outliers
----------------------------------------
amd64 Linux 3.4.0-gentoo 2 cpu(s)
OpenJDK 64-Bit Server VM 22.0-b10
Runtime arguments: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n 
-XX:+TieredCompilation -Xmx1G 
-Dclojure.compile.path=/home/horn/Repos/clj/testi/target/classes 
-Dtesti.version=0.1.0-SNAPSHOT -Dclojure.debug=false
Evaluation count             : 780
             Execution time mean : 81.983797 ms  95.0% CI: (81.961708 ms, 
82.003751 ms)
    Execution time std-deviation : 2.200307 ms  95.0% CI: (2.190147 ms, 
2.210750 ms)
         Execution time lower ci : 77.945103 ms  95.0% CI: (77.945103 ms, 
77.945103 ms)
         Execution time upper ci : 85.109668 ms  95.0% CI: (85.109668 ms, 
85.109668 ms)
nil
--8<---------------cut here---------------end--------------->8---

So `comp*` is about 10.000 times faster than `comp` when creating
compositions.

But that doesn't change too much.  If I use the two implementations of
my last post and these two benchmarking fns...

--8<---------------cut here---------------start------------->8---
(defn reduce-right-with-comp*
  "Conses to a list of fns and uses (apply comp* list-of-fns)."
  []
  (reduce-right - (range 200000)))

(defn reduce-right-with-comp []
  "Conjoins to a vector of fns and uses (apply comp vec-of-fns) which
  reverses the vector."
  (reduce-right-old - (range 200000)))
--8<---------------cut here---------------end--------------->8---

... I get:

--8<---------------cut here---------------start------------->8---
r-reduce> (do
           (bench (reduce-right-with-comp*) :verbose)
           (println "----------------------------------------")
           (bench (reduce-right-with-comp) :verbose))
amd64 Linux 3.4.0-gentoo 2 cpu(s)
OpenJDK 64-Bit Server VM 22.0-b10
Runtime arguments: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n 
-XX:+TieredCompilation -Xmx1G 
-Dclojure.compile.path=/home/horn/Repos/clj/testi/target/classes 
-Dtesti.version=0.1.0-SNAPSHOT -Dclojure.debug=false
Evaluation count             : 420
             Execution time mean : 167.787313 ms  95.0% CI: (167.713185 ms, 
167.874011 ms)
    Execution time std-deviation : 8.172843 ms  95.0% CI: (8.095159 ms, 
8.231008 ms)
         Execution time lower ci : 156.327257 ms  95.0% CI: (156.327257 ms, 
156.327257 ms)
         Execution time upper ci : 184.392968 ms  95.0% CI: (184.162446 ms, 
184.392968 ms)

Found 5 outliers in 60 samples (8.3333 %)
        low-severe       3 (5.0000 %)
        low-mild         2 (3.3333 %)
 Variance from outliers : 35.1910 % Variance is moderately inflated by outliers
----------------------------------------
amd64 Linux 3.4.0-gentoo 2 cpu(s)
OpenJDK 64-Bit Server VM 22.0-b10
Runtime arguments: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n 
-XX:+TieredCompilation -Xmx1G 
-Dclojure.compile.path=/home/horn/Repos/clj/testi/target/classes 
-Dtesti.version=0.1.0-SNAPSHOT -Dclojure.debug=false
Evaluation count             : 360
             Execution time mean : 193.625805 ms  95.0% CI: (193.566549 ms, 
193.688350 ms)
    Execution time std-deviation : 7.953619 ms  95.0% CI: (7.911386 ms, 
7.989930 ms)
         Execution time lower ci : 184.161529 ms  95.0% CI: (184.161529 ms, 
184.161529 ms)
         Execution time upper ci : 209.447091 ms  95.0% CI: (209.430756 ms, 
209.447091 ms)

Found 2 outliers in 60 samples (3.3333 %)
        low-severe       2 (3.3333 %)
 Variance from outliers : 27.0944 % Variance is moderately inflated by outliers
nil
--8<---------------cut here---------------end--------------->8---

So here, the version with `comp*` is about 25 ms faster.  Considering
the first benchmark, I expected it to be about 80 ms faster, because
`(apply comp seq-of-200000-fn)` takes about 80 ms longer than the same
using `comp*`.  [I assume that consing or conjoining to a list is about
the same as conjoining to a vector...  In fact, its faster, so the
benefit should be even greater than 80 ms.]

Even stranger, if I increase the number to one million, the
vector-with-comp version is slightly faster.

--8<---------------cut here---------------start------------->8---
r-reduce> (do
           (bench (reduce-right-with-comp*) :verbose)
           (println "----------------------------------------")
           (bench (reduce-right-with-comp) :verbose))
amd64 Linux 3.4.0-gentoo 2 cpu(s)
OpenJDK 64-Bit Server VM 22.0-b10
Runtime arguments: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n 
-XX:+TieredCompilation -Xmx1G 
-Dclojure.compile.path=/home/horn/Repos/clj/testi/target/classes 
-Dtesti.version=0.1.0-SNAPSHOT -Dclojure.debug=false
Evaluation count             : 60
             Execution time mean : 1.437824 sec  95.0% CI: (1.435744 sec, 
1.439028 sec)
    Execution time std-deviation : 238.811129 ms  95.0% CI: (237.060303 ms, 
240.253146 ms)
         Execution time lower ci : 1.221996 sec  95.0% CI: (1.221996 sec, 
1.221996 sec)
         Execution time upper ci : 1.923629 sec  95.0% CI: (1.921390 sec, 
1.923629 sec)

Found 3 outliers in 60 samples (5.0000 %)
        low-severe       3 (5.0000 %)
 Variance from outliers : 87.5968 % Variance is severely inflated by outliers
----------------------------------------
amd64 Linux 3.4.0-gentoo 2 cpu(s)
OpenJDK 64-Bit Server VM 22.0-b10
Runtime arguments: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n 
-XX:+TieredCompilation -Xmx1G 
-Dclojure.compile.path=/home/horn/Repos/clj/testi/target/classes 
-Dtesti.version=0.1.0-SNAPSHOT -Dclojure.debug=false
Evaluation count             : 60
             Execution time mean : 1.345220 sec  95.0% CI: (1.343476 sec, 
1.347048 sec)
    Execution time std-deviation : 200.290969 ms  95.0% CI: (198.836108 ms, 
202.075390 ms)
         Execution time lower ci : 1.139444 sec  95.0% CI: (1.139444 sec, 
1.139444 sec)
         Execution time upper ci : 1.929759 sec  95.0% CI: (1.929753 sec, 
1.929759 sec)

Found 6 outliers in 60 samples (10.0000 %)
        low-severe       1 (1.6667 %)
        low-mild         5 (8.3333 %)
 Variance from outliers : 84.1513 % Variance is severely inflated by outliers
nil
--8<---------------cut here---------------end--------------->8---

If I increase the numbers even more, then the vector-with-comp variant
enlarges its advance.

This suggests that while `comp*` is much faster in creating a function
composition, the function it created is much slower when being applied.
The following benchmark seems to support that claim.

--8<---------------cut here---------------start------------->8---
r-reduce> (let [f1 (apply comp* (take 1000000 (repeat inc)))
                f2 (apply comp (take 1000000 (repeat inc)))]
            (bench (f1 0) :verbose)
            (println "---------------------------------------")
            (bench (f2 0) :verbose))
amd64 Linux 3.4.0-gentoo 2 cpu(s)
OpenJDK 64-Bit Server VM 22.0-b10
Runtime arguments: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n 
-XX:+TieredCompilation -Xmx1G 
-Dclojure.compile.path=/home/horn/Repos/clj/testi/target/classes 
-Dtesti.version=0.1.0-SNAPSHOT -Dclojure.debug=false
Evaluation count             : 540
             Execution time mean : 125.550362 ms  95.0% CI: (125.534153 ms, 
125.565505 ms)
    Execution time std-deviation : 1.539032 ms  95.0% CI: (1.524641 ms, 
1.550026 ms)
         Execution time lower ci : 123.563049 ms  95.0% CI: (123.539807 ms, 
123.563049 ms)
         Execution time upper ci : 128.757621 ms  95.0% CI: (128.757621 ms, 
128.781272 ms)

Found 6 outliers in 60 samples (10.0000 %)
        low-severe       5 (8.3333 %)
        low-mild         1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
---------------------------------------
amd64 Linux 3.4.0-gentoo 2 cpu(s)
OpenJDK 64-Bit Server VM 22.0-b10
Runtime arguments: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n 
-XX:+TieredCompilation -Xmx1G 
-Dclojure.compile.path=/home/horn/Repos/clj/testi/target/classes 
-Dtesti.version=0.1.0-SNAPSHOT -Dclojure.debug=false
Evaluation count             : 1260
             Execution time mean : 48.581007 ms  95.0% CI: (48.563129 ms, 
48.599636 ms)
    Execution time std-deviation : 1.846413 ms  95.0% CI: (1.829196 ms, 
1.860582 ms)
         Execution time lower ci : 47.463785 ms  95.0% CI: (47.463785 ms, 
47.463785 ms)
         Execution time upper ci : 53.088105 ms  95.0% CI: (53.034291 ms, 
53.088105 ms)

Found 12 outliers in 60 samples (20.0000 %)
        low-severe       2 (3.3333 %)
        low-mild         10 (16.6667 %)
 Variance from outliers : 23.8733 % Variance is moderately inflated by outliers
nil
--8<---------------cut here---------------end--------------->8---

But my patch to CLJ-1010 which implements `comp*` bases the 4-or-more
arity version of the existing `comp` on `comp*`, i.e., it's literally

    ;; comp for arbitrary arity, i.e., [f1 f2 f3 & fs]
    (apply comp* (reverse (list* f1 f2 f3 fs)))

Now how can it be that compositions created with `(apply comp ...)`
*evaluate* ~2.5 times faster than compositions created with `(apply
comp* ...)`?

Bye,
Tassilo, totally stunned...

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to