Re: why the big difference in speed?

Ranjit Tue, 21 Sep 2010 13:07:05 -0700

Thanks, I just tried using the random number generator in Incanter as
a replacement and it shaves another ~100 msec off the runtime of the
function. That's better but still noticeably slower than python.


I also tried applying what I've learned here to writing a function
that adds two arrays which is imperative and I used type hints to get
rid of the reflection warnings:

(defn sum-fields [^"[[D" arr1 ^"[[D" arr2 ^"[[D" result]
  (let [L (alength arr1)]
    (dotimes [i L]
      (dotimes [j L]
        (aset-double ^doubles (aget result i) j (+ (aget arr1 i j)
                             (aget arr2 i j)))))))

but it's surprisingly slow compared to numpy again.


On Sep 21, 2:26 pm, Jason Wolfe <jawo...@berkeley.edu> wrote:
> FYI I fired up a profiler and more than 2/3 of the runtime of gaussian-
> matrix5 is going to the .nextGaussian method of java.util.Random.  I
> think there are faster (and higher-quality) drop-in replacements for
> java.util.Random, if that is really important to you.  I seem to
> recall there being a good Mersenne Twister implementation around,
> which might fit your bill.
>
> -Jason
>
> On Sep 21, 6:34 am, Ranjit <rjcha...@gmail.com> wrote:
>
> > Yeah, I spoke too soon. All the rows are identical. This is what I
> > meant to do:
>
> > (defn gaussian-matrix-for-the-nth-time [L]
> >   (into-array (map double-array (repeatedly L #(repeatedly L next-
> > gaussian)))))
>
> > and on my computer takes ~800 msecs vs. ~400 msecs for gaussian-
> > matrix5 to make an array of 1000^2 gaussians.
>
> > But numpy only takes about 100 msecs to do the same thing on my
> > computer. I'm surprised we can't beat that or at least get close. But
> > maybe next-gaussian is the bottleneck as you say.
>
> > On Sep 21, 12:20 am, Jason Wolfe <jawo...@berkeley.edu> wrote:
>
> > > On Sep 20, 4:43 pm, Ranjit <rjcha...@gmail.com> wrote:
>
> > > > I'm glad you think partition is the problem, because that was my guess
> > > > too. But I think I have the answer. This is the fastest version I've
> > > > seen so far:
>
> > > > (defn gaussian-matrix-final [L]
> > > >   (into-array ^doubles (map double-array (repeat L (repeatedly L next-
> > > > gaussian)))))
>
> > > The ^doubles type hint is wrong (it means array of doubles, not seq of
> > > arrays of doubles); the compiler is just ignoring it.
>
> > > And the reason it's so fast is probably since you're repeating the
> > > same L gaussian values L times here.  Doesn't matter how fast it runs
> > > if it returns the wrong answer :-).   Anyway, that might indicate that
> > > the .nextGaussian is actually the bottleneck in the fastest versions.
>
> > > -Jason
>
> > > > If I understand what's going on now, then it looks like the only way
> > > > to make this any faster is if next-gaussian could return primitives.
>
> > > > The for, and doseq macros seems like they're pretty slow.
>
> > > > -Ranjit
>
> > > > On Sep 20, 3:30 pm, Jason Wolfe <jawo...@berkeley.edu> wrote:
>
> > > > > I think partition is slowing you down (but haven't profiled to
> > > > > verify).  Here's a functional version that's about 70% as fast as my
> > > > > "5":
>
> > > > > (defn gaussian-matrix6 [L]
> > > > >      (to-array (for [i (range L)] (into-array Double/TYPE (for [j
> > > > > (range L)] (next-gaussian))))))
>
> > > > > and I'd guess that's about as good as you're going to get, given that
> > > > > this approach is necessarily going to box and unbox the doubles, and
> > > > > create intermediate sequences, rather than stuffing the primitive
> > > > > doubles directly into the result array.
>
> > > > > -Jason
>
> > > > > On Sep 20, 12:00 pm, Ranjit <rjcha...@gmail.com> wrote:
>
> > > > > > Replacing the doseq's with dotimes speeds it up a little more:
>
> > > > > > (defn gaussian-matrix5 [^"[[D" arr]
> > > > > >   (dotimes [x (alength arr)]
> > > > > >     (dotimes [y (alength (first arr))]
> > > > > >       (aset-double ^doubles (aget arr (int x)) (int y) (next-
> > > > > > gaussian)))))
>
> > > > > > but I'm getting reflection warnings on alength. I guess it doesn't
> > > > > > cause a problem because they're only called once?
>
> > > > > > Also adding type hints to the more functional version of my first
> > > > > > attempt speeds things up quite a bit:
>
> > > > > > (defn gaussian-matrix2 [L]
> > > > > >      (into-array ^doubles
> > > > > >           (map double-array (partition L (repeatedly (* L L) next-
> > > > > > gaussian)))))
>
> > > > > > But it's still about 4x slower than gaussian-matrix5 above. There 
> > > > > > must
> > > > > > be a way to improve on the inner loop here that doesn't require 
> > > > > > using
> > > > > > indices, right?
>
> > > > > > On Sep 20, 12:32 pm, Jason Wolfe <jawo...@berkeley.edu> wrote:
>
> > > > > > > Oops, I found aset-double2 with tab completion and figured it was
> > > > > > > build-in.  Forgot it was a utility I built some time ago, a stub 
> > > > > > > for a
> > > > > > > Java method that does the setting.
>
> > > > > > > Also, I got the type hint for the "arr" arg wrong, although it 
> > > > > > > didn't
> > > > > > > seem to matter.
>
> > > > > > > Here's a fixed version in standard Clojure that's basically as 
> > > > > > > fast:
>
> > > > > > > user> (defn gaussian-matrix4 [^"[[D" arr ^int L]
> > > > > > >             (doseq [x (range L) y (range L)] (aset-double ^doubles
> > > > > > > (aget arr (int x)) (int y) (.nextGaussian ^Random r))))
> > > > > > > #'user/gaussian-matrix4
> > > > > > > user> (do   (microbench (gaussian-matrix3 (make-array Double/TYPE 
> > > > > > > 10
> > > > > > > 10) 10)) (microbench (gaussian-matrix4 (make-array Double/TYPE 10 
> > > > > > > 10)
> > > > > > > 10)) )
> > > > > > > min; avg; max ms:  0.000 ; 0.033 ; 8.837    ( 56828  iterations)
> > > > > > > min; avg; max ms:  0.009 ; 0.038 ; 7.132    ( 50579  iterations)
>
> > > > > > > It seems like you should be able to just use aset-double with 
> > > > > > > multiple
> > > > > > > indices (in place of aset-double2), but I can't seem to get the 
> > > > > > > type
> > > > > > > hints right.
>
> > > > > > > -Jason
>
> > > > > > > On Sep 20, 7:36 am, Ranjit <rjcha...@gmail.com> wrote:
>
> > > > > > > > Thanks Jason, this is great.
>
> > > > > > > > I was confused earlier because I wasn't seeing reflection 
> > > > > > > > warnings,
> > > > > > > > but it turns out that was only because I was evaluating the 
> > > > > > > > function
> > > > > > > > definitions in the emacs buffer, and the warnings weren't 
> > > > > > > > visible.
>
> > > > > > > > I have a question about gaussian-matrix3 though. What is "aset-
> > > > > > > > double2"? Is that a macro that has a type hint for an array of
> > > > > > > > doubles?
>
> > > > > > > > Thanks,
>
> > > > > > > > -Ranjit
> > > > > > > > On Sep 19, 5:37 pm, Jason Wolfe <jawo...@berkeley.edu> wrote:
>
> > > > > > > > > Hi Ranjit,
>
> > > > > > > > > The big perf differences you're seeing are due to reflective 
> > > > > > > > > calls.
> > > > > > > > > Getting the Java array bits properly type-hinted is 
> > > > > > > > > especially tricky,
> > > > > > > > > since you don't always get good reflection warnings.
>
> > > > > > > > > Note that aset is only fast for reference types:
>
> > > > > > > > > user> (doc aset)
> > > > > > > > > -------------------------
> > > > > > > > > clojure.core/aset
> > > > > > > > > ([array idx val] [array idx idx2 & idxv])
> > > > > > > > >   Sets the value at the index/indices. Works on Java arrays of
> > > > > > > > >   reference types. Returns val.
>
> > > > > > > > > So, if you want to speed things up ... here's your starting 
> > > > > > > > > point:
>
> > > > > > > > > user> (set! *warn-on-reflection* true)
> > > > > > > > > true
> > > > > > > > > user> (import java.util.Random)
> > > > > > > > > (def r (Random. ))
>
> > > > > > > > > (defn next-gaussian [] (.nextGaussian r))
>
> > > > > > > > > (defn gaussian-matrix1 [arr L]
> > > > > > > > >      (doseq [x (range L) y (range L)] (aset arr x y 
> > > > > > > > > (next-gaussian))))
>
> > > > > > > > > (defn gaussian-matrix2 [L]
> > > > > > > > >      (into-array (map double-array (partition L (repeatedly 
> > > > > > > > > (* L L)
> > > > > > > > > next-gaussian)))))
>
> > > > > > > > > Reflection warning, NO_SOURCE_FILE:1 - reference to field 
> > > > > > > > > nextGaussian
> > > > > > > > > can't be resolved.
>
> > > > > > > > > user> (do  (microbench (gaussian-matrix1 (make-array 
> > > > > > > > > Double/TYPE 10
> > > > > > > > > 10) 10)) (microbench (gaussian-matrix2  10)) )
> > > > > > > > > min; avg; max ms:  2.944 ; 4.693 ; 34.643    ( 424  
> > > > > > > > > iterations)
> > > > > > > > > min; avg; max ms:  0.346 ; 0.567 ; 11.006    ( 3491  
> > > > > > > > > iterations)
>
> > > > > > > > > ;; Now, we can get rid of the reflection in next-guassian:
>
> > > > > > > > > user> (defn next-gaussian [] (.nextGaussian #^Random r))
> > > > > > > > > #'user/next-gaussian
> > > > > > > > > user> (do  (microbench (gaussian-matrix1 (make-array 
> > > > > > > > > Double/TYPE 10
> > > > > > > > > 10) 10)) (microbench (gaussian-matrix2  10)) )
> > > > > > > > > min; avg; max ms:  2.639 ; 4.194 ; 25.024    ( 475  
> > > > > > > > > iterations)
> > > > > > > > > min; avg; max ms:  0.068 ; 0.130 ; 10.766    ( 15104  
> > > > > > > > > iterations)
> > > > > > > > > nil
>
> > > > > > > > > ;; which has cut out the main bottleneck in gaussian-matrix2.
> > > > > > > > > ;; 1 is still slow because of its array handling.
> > > > > > > > > ;; here's a fixed version:
>
> > > > > > > > > user> (defn gaussian-matrix3 [^doubles arr ^int L]
> > > > > > > > >      (doseq [x (range L) y (range L)] (aset-double2 arr (int 
> > > > > > > > > x) (int
> > > > > > > > > y) (.nextGaussian ^Random r))))
> > > > > > > > > #'user/gaussian-matrix3
>
> > > > > > > > > user> (do  (microbench (gaussian-matrix1 (make-array 
> > > > > > > > > Double/TYPE 10
> > > > > > > > > 10) 10)) (microbench (gaussian-matrix2  10)) (microbench 
> > > > > > > > > (gaussian-
> > > > > > > > > matrix3 (make-array Double/TYPE 10 10) 10)) )
> > > > > > > > > min; avg; max ms:  2.656 ; 4.164 ; 12.752    ( 479  
> > > > > > > > > iterations)
> > > > > > > > > min; avg; max ms:  0.065 ; 0.128 ; 9.712    ( 15255  
> > > > > > > > > iterations)
> > > > > > > > > min; avg; max ms:  0.000 ; 0.035 ; 10.180    ( 54618  
> > > > > > > > > iterations)
> > > > > > > > > nil
>
> > > > > > > > > ;; which is 100x faster than where we started.
>
> > > > > > > > > A profiler is often a great way to figure out what's eating 
> > > > > > > > > up time.
> > > > > > > > > Personally, I've never found the need to use a disassembler.
>
> > > > > > > > > Cheers, Jason
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: why the big difference in speed?

Reply via email to